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EDITORIAL 


A regional approach to save the Amazon 


arly in August this year, a high-profile summit 
was held in Belém, Brazil, where the eight Ama- 
zonian countries discussed the future of the Ama- 
zon. The nations recognized that the Amazon is 
very close to reaching a tipping point for turning 
into a degraded ecosystem. The result of their dis- 
cussions was the Belém Declaration, an ambitious 
plan to protect and conserve the Amazon forests and 
to support Indigenous Peoples and local communities. 
Concern arose, however, because they failed to agree on 
attaining zero deforestation by 2030 and on avoiding 
new explorations in the Amazon for fossil fuel. The Dec- 
laration also lacks specific and measurable indicators. 
The ministers of Foreign Affairs therefore have a very 
important role in further refin- 
ing the agenda and deadlines so 
that the Belém Declaration can ‘ 
be implemented. 

For over three decades, sci- 
ence has pointed to the risks of 
the Amazon reaching a tipping 
point. Several recent studies 
now demonstrate how close it 
is: The dry season over southern 
Amazon has lengthened by 4 to 
5 weeks over the past 40 years, 
the mortality of wet-loving tree 
species has increased, and the 
loss of trees is turning the for- 
ests into a carbon source rather than a carbon sink. 

The Amazon is an integrated system at various 
spatial and temporal scales. For example, the trans- 
port of water vapor in the atmosphere (so-called “fly- 
ing rivers”) moves water within the Amazon as well 
as from the Amazon Basin to other parts of South 
America. Up to 40% of precipitation in the western 
Amazon and the Andes depends on water that is re- 
cycled by forests in the eastern and central Amazon. 
As precipitation generated by these flying rivers de- 
creases with deforestation, decisions made in Brazil 
regarding deforestation have implications for other 
Amazonian countries and for South America overall. 
There is a need to look at the Amazon from a regional 
perspective first, to better conceive of integrated so- 
lutions. A regional focus of the Belém Declaration is 
therefore a key step toward its implementation. 

Collaboration between the eight Amazonian coun- 
tries is essential to achieve the Belém goals. For example, 
concerted actions are needed to stop illegal undertak- 
ings, such as drug trafficking, illegal mining, and land 
grabbing. These activities easily cross national borders 


“00K at the Amazon 
from a regional 


perspective first, to 
better conceive of 
integrated solutions.” 


without much control. The good news is that some col- 
laborations already exist, such as the implementation 
across all the Amazonian countries of methods devel- 
oped in Brazil to track deforestation, forest degrada- 
tion, fires, and mining in real time. But there is more 
to be done. Research collaboration across Amazonian 
countries is crucial to gain a better understanding of 
the socioeconomic and ecological processes that occur 
at larger spatial scales. This includes, for example, the 
effect of dams on both upstream fish migration and on 
the livelihoods of local communities. As well, regional 
collaboration is key to promoting a common Amazo- 
nian agenda on the global stage, such as at the upcom- 
ing United Nations Climate Change Conference (COP28 
2023) in Dubai, where interna- 
tional partnership and financial 
support can be negotiated. 

The Belém Declaration also 
promotes seeking alternatives to 
current economic development 
models. Creating alternatives 
rests on large investments in 
research, innovation, and tech- 
nology and will require joining 
scientific knowledge with tradi- 
tional knowledge of Indigenous 
Peoples and local communi- 
ties. One such alternative is a 
socio-bioeconomy based on the 
sustainable use of natural ecosystems to support the 
well-being, knowledge, rights, and territories of the Am- 
azonian inhabitants. The diversity of realities across the 
Amazon is large, so there are no single or simple solu- 
tions for these challenges. Spaces for dialogue are there- 
fore needed to exchange knowledge and experiences. 

Several research-based recommendations in the 
Declaration collectively point to the need to continue 
providing Amazonian governments with information to 
implement its suggestions. The importance of science 
in decision-making is indicated by the intention to cre- 
ate an Intergovernmental Science Panel on the Amazon, 
a forum that the existing Science Panel for the Amazon 
(created by the UN Sustainable Solutions Network in 
September 2019) can work with. 

The ministers of Foreign Affairs of the Amazon Coop- 
eration Treaty (ACT) must work together now to push 
the Declaration into action in the coming years. And the 
international community should support this effort, both 
financially and politically. Collective action is needed to 
save the Amazon. 

—Marielos Pefia-Claros and Carlos Nobre 
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Anew x-ray source, the Linac 
Coherent Light Source Il, is 
powered by a 1-kilometer-long, 
superconducting accelerator. 


United States reclaims the lead in x-ray lasers 


he United States this week regained from Europe 
an edge in x-ray research as SLAC National 
Accelerator Laboratory turned on a new 
$1.1 billion x-ray free electron laser (XFEL), a 
facility for probing matter in all its forms. The 
Linac Coherent Light Source II (LCLS-II) is 
expected to generate 1 million x-ray flashes per sec- 
ond, more than any other XFEL. In an XFEL, a lin- 
ear accelerator, or linac, shoots electrons through 
magnets that shake the particles sideways and make 
them emit x-rays. The x-rays then nudge the electrons 


into tiny bunches that radiate far more efficiently, 
producing bursts of x-ray laser light. In 2009, SLAC 
unveiled the first XFEL, the LCLS, which was fed elec- 
trons from the lab’s decades-old linac and generated 
120 x-ray flashes per second. In 2017, the €1.2 billion 
European XFEL blew past that mark, using a super- 
conducting linac that can produce 27,000 x-ray flashes 
per second. To top that rate, the LCLS-II also employs 
a superconducting linac. In 2027, China aims to com- 
plete an XFEL that will rival both the LCLS-II and the 
European XFEL. 


Conservationist released by Iran 


HUMAN RIGHTS | The five U.S. citizens 
released by Iran this week in a controver- 
sial prisoner swap include businessman 
Morad Tahbaz, jailed in 2018 on charges 
that a wildlife conservation group he 
co-founded was spying. Iran’s Islamic 
Revolutionary Guard Corps had accused 
him and eight other people affiliated with 
the Persian Wildlife Heritage Foundation, a 
nonprofit in Tehran, of using wildlife cam- 
era traps to spy on military installations. 
Hundreds of conservationists and scholars 
from 66 countries protested the charges, 
which the administration of former Iranian 
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President Hassan Rouhani determined 
were unfounded. Seven other foundation 
workers are still serving prison terms, and 
an eighth died in custody before his court 
case was completed. 


NIH keeps notebook mandate 


RESEARCH INTEGRITY | The U.S. National 
Institutes of Health is standing firm on a 
controversial new policy requiring grant 
recipients abroad to periodically share 
lab notebooks and other raw data with 
their NIH-funded partner in the United 
States. The mandate, a response to two 
federal audits urging better oversight 


of foreign subawards, follows criticism 

of China’s Wuhan Institute of Virology 
(WIV) for not responding to NIH requests 
for lab notebooks. Some critics have 
claimed, without any direct evidence, 

that WIV started the COVID-19 pandemic 
through a lab leak, and this year NIH 
banned it from receiving agency fund- 
ing. Nearly 500 researchers and groups 
had complained that an earlier version of 
NIH’s notebook policy, released in May, 
was costly and unnecessary and would 
erode trust in global health research. In 
the final version, NIH eased some of the 
original requirements; data can be shared 
electronically once a year instead of every 
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few months as originally proposed. Some 
researchers say it will still harm long- 
standing collaborations. 


Diphtheria surges in Nigeria 


PUBLIC HEALTH | Nigeria logged nearly 
6000 cases of diphtheria in July and 
August, more than were reported glob- 
ally in all of 2022, the World Health 
Organization (WHO) announced last week. 
The bacterial disease is spread by respira- 
tory droplets and direct contact and can 
be prevented by a vaccine. It has a case 
fatality rate of as high as 40% if not treated 
with antitoxin—supplies of which are “very 
constrained,” WHO reported. The organi- 
zation says it has shipped 4500 of 14,200 
vials that Nigeria has requested. Algeria, 
Guinea, and Niger have reported smaller 
outbreaks recently. WHO says Guinea 

and four other countries in addition to 
Nigeria have separately requested 4000 
vials. WHO purchases antitoxin, made 

in horses, from just two companies, and 
supply has been problematic for a decade. 
Almost all Nigeria’s infections were in the 
country’s politically unstable north. Nearly 


A hiker uses the 
iNaturalist phone 
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a gorse 

bush: 


CONSERVATION 


75% occurred in children younger than 15, 
just 23% of whom were fully vaccinated, 
despite recommendations that all young 
children be inoculated. 


The stuff it takes to end poverty 


Equity | Lifting the 1.2 billion poor- 

est people out of poverty by establishing 
decent living conditions for them may 
require about 43 billion tons of raw materi- 
als, according to the first such estimate. 
Researchers calculated the mass of all the 
concrete, steel, and other supplies to build 
the necessary housing, hospitals, schools, 
and other basic infrastructure. The team 
separately estimated the quantity needed 
annually to provide the poor with suf- 
ficient calories, transportation, clothing, 

and communications; this total, 7.2 billion 
tons, is just 8% of what the rest of humanity 
consumes for those purposes each year, they 
reported last week in Environmental Science 
& Technology. Policymakers trying to reduce 
poverty need such metrics, which are similar 
to those available for other policy challenges, 
says Stefan Bringezu of the University of 
Kassel, who was not involved in the study. 


Popular biodiversity app to expand 


he nonprofit that runs iNaturalist, a popular app and website for identifying spe- 
cies, has received a $10 million grant to expand. The funding from the Gordon 
and Betty Moore Foundation, announced last week, will allow iNaturalist—whose 
website is one of the largest generators of crowd-sourced species-occurrence 
data—to add users, technology, and observations to inform conservation. iNatu- 
ralist hopes to grow in nature-rich parts of the world, such as Asia and South America, 
which have fewer users uploading data. Since iNaturalist’s founding in 2008, the plat- 
form has recorded more than 150 million verified observations, and its data have been 
tapped in more than 4000 scientific papers. The nonprofit has a $3 million budget. 
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ARPA-H’S CANCER PUSH President Joe 
Biden's 1-year-old agency to fund high-risk 
biomedical research is taking a starring role 
in his reignited Cancer Moonshot, an effort 
to halve the U.S. cancer death rate by 2047. 
The White House said last week the 

$1.5 billion Advanced Research Projects 
Agency for Health will spend an additional 
$240 million of its budget this year on 
moonshot-related projects, including sur- 
gery improvements, tools to detect tumors 
early, and bacterial treatments. 


UFO WATCH NASA has appointed a direc- 
tor of research on UFOs, or “unidentified 
anomalous phenomena” (UAPs). The new 
director, Mark McInerney, served as NASA's 
UAP liaison with the Department of Defense. . 
NASA is also considering the recommenda- 
tions from an advisory report, released last 
week, on how to study UAPs; they include 
encouraging volunteers to report sightings 
on their smartphones and drawing on data 
from existing Earth-observing satellites. 


PARTY LINE The Chinese Academy of 
Sciences has updated its code of conduct 
to require its 800 members to tailor their 
public statements to be “in line with the 
general policy of the Central Committee of 
the Communist Party of China.” Academy 
members are also barred from openly 
expressing scientific views unrelated to their 
field of expertise. 


PIG-TO-HUMAN TRANSPLANT Physicians at 
New York University last week announced they 
removed a genetically modified pig kidney they 
had implanted 2 months earlier in the body of 
a 58-year-old who had been declared neuro- 
logically dead. The longest such trial to date, it 
was part of research meant to eventually win 
regulators’ approval for transplanting such . 
animal organs into live humans. More than 
88,000 U.S. residents are on a waiting list for a 
donated human kidney. 


© 


RESEARCHGATE SETTLEMENT Scientific 
publishing giants Elsevier and the American 
Chemical Society (ACS) have settled a 
copyright-infringement lawsuit they brought 
in 2017 against the ResearchGate networking 
platform after many scientists posted the full 
text of research articles there. The publishers 
said the practice illegally included millions of 
paywalled papers. In a joint statement last 
week, the parties said ResearchGate has 
agreed to a system that automatically checks 
the copyright of articles from those publish- 
ers when users upload them. They will be 
allowed to privately send paywalled papers to 
other users upon request. 
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SPACE PHYSICS 


Solar max set 


to arrive sooner 
than expected— 


and stronger 


A feisty Sun threatens satellites 
and electric grids, highlighting 


need for better forecasts 


By Zack Savitsky 


n 2019, as the Sun approached a mini- 

mum in its ll-year cycle of magnetic 

activity, a dozen scientists assembled 

for a traditional exercise: forecasting 

the next peak. Now, a few years into the 

Sun’s resurgence, it’s becoming clear 
that the official prediction from the panel, 
convened by NASA, the National Oceanic 
and Atmospheric Administration (NOAA), 
and the International Space Environment 
Service (ISES), missed the mark. The Sun’s 
activity has already surpassed the forecast, 
reaching levels not seen in 20 years, and 
solar maximum may arrive within the next 
year, months ahead of its presumed sched- 
ule. “Obviously the panel underestimated 
it,’ says Ilya Usoskin, a physicist at the Uni- 
versity of Oulu. 

The discrepancy highlights a need for 
better observations of the Sun. It may also 
point to unknown factors influencing the 
churning dynamo of ionized gas that gives 
rise to the Sun’s magnetic field. “I’d like to 
think we’re making progress in terms of un- 
derstanding the dynamo, but there’s work 
to do,” says Mark Miesch, a solar physicist at 
the University of Colorado Boulder. 

The stakes are high. At peak activity, the 
Sun more often unleashes particle storms 
that crash into Earth, threatening satel- 
lites, jamming radio transmissions, and 
overloading power grids. Because the previ- 
ous cycle was unusually mild, “We’ve been 
lulled into a false sense of complacency,” 
says Tamitha Skov, a heliophysicist at Mill- 
ersville University. 

Scientists typically track solar cycles 
by counting sunspots—flares of activity 
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spurred by knots of magnetic field loops. 
The sunspot number climbs over the course 
of a solar cycle, then drops near zero as 
magnetic activity subsides. When the 
NASA-NOAA-ISES prediction panel met in 
2019, it analyzed about 60 different fore- 
cast models, each offering an estimate for 
the peak number of sunspots and when it 
would arrive (Science, 31 May 2019, p. 818). 

Some of the models are purely statisti- 
cal, making forecasts by extrapolating cen- 
turies of sunspot observations. Others rely 
on observable “precursors” thought to be 
correlated with the solar cycle, such as the 
strength of the magnetic field at the Sun’s 
poles at solar minimum. As the cycle pro- 
gresses, that “seed field” gets more powerful 
as its field lines are wound up into a dough- 
nut shape by the way the Sun rotates—faster 
at the equator than at the poles. A third cat- 


Elevated pulse 

The Sun's upcoming cycle looks to be the strongest 
in 20 years, contrary to forecasts, which also missed 
how quickly solar maximum is approaching. 
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egory relies on advanced computer models 
that work like climate models, ingesting as 
much observable data as possible and then 
using the laws of physics to simulate the 
Sun’s dynamo and shifting magnetic fields. 

After a week of discussing the merits 
of different approaches, the panel voted 
and hashed out a consensus: The monthly 
sunspot count would peak at about 115, 
sometime around July 2025—making it a 
relatively weak cycle, much like the preced- 
ing one. But the Sun has already woken up 
faster and is feistier than expected. It sported 
159 sunspots in July and 115 in August. 

“Did we get it absolutely right? No,’ says 
Lisa Upton, a physicist at the Southwest Re- 
search Institute who co-chaired the panel. 
“But considering the level of uncertainty 
that’s associated with what we're trying to 
do here, it’s actually a quite good prediction.” 

Upton believes one reason the panel’s 
prediction fell short is the quality and lon- 
gevity of the observations that feed and 
drive the precursor and dynamo models— 
most importantly, the strength of the polar 
magnetic field. Those values come primar- 
ily from the Wilcox Solar Observatory in 
California, which can see the imprint of 
the polar field on the spectrum of sunlight. 
But the telescope has relatively poor reso- 
lution and a limited view. NASA mission 
concepts such as Firefly and Solaris would 
send spacecraft closer to the Sun to probe 
its polar fields directly, but they’re still in 
the development phase. 

Other researchers suspect a deeper snag. 
The relationship between polar magnetic 
fields and subsequent solar activity is 
drawn from measurements spanning only 
a few decades, and other factors may be at 
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On 10 January, a flare erupted on the Sun (upper left 
edge)—a sign of increasing magnetic activity. 


work. Clues are coming from observations 
led by Scott McIntosh, a solar physicist and 
deputy director of the National Center for 
Atmospheric Research. 

For 2 decades, he and colleagues have 
tracked millions of “bright points” in 
extreme-ultraviolet images of the Sun that 
they think trace bands of magnetic field 
traveling under the Sun’s skin. The bright 
points seem to follow a pattern across two 
solar cycles: Clusters routinely emerge at 
midlatitudes at the start of the first solar 
cycle. They then migrate toward the equator 
as the solar activity peaks, falls, and peaks 
again. At the end of the second cycle, the 
points suddenly disappear in what the re- 
searchers call a “terminator event.” Just af- 
ter this event, the bright points reappear at 
midlatitudes and start the cycle afresh. 

McIntosh believes the double-cycle pat- 
tern means the underlying field bands from 
successive cycles must be _ interacting— 
sometimes constructively, leading to in- 
creased solar activity. And he thinks the 
timing of consecutive terminator events 
can be used to forecast this interference— 
and the height and timing of the next solar 
maximum. After spotting the most recent 
terminator event in December 2021, he and 
colleagues predicted this cycle’s sunspots 
would peak at about 184 sometime near 
early 2024. 

“Tt’s a fascinating pattern and something 
that will challenge dynamo theory,’ Miesch 
says. Dibyendu Nandi, an astrophysicist at 
the Indian Institute of Science Education 
and Research Kolkata who worked on dy- 
namo models used in the 2019 panel predic- 
tion, doesn’t buy the predictive power of the 
terminator events. However, he does believe 
that bright points may be an important sig- 
nal. “I think they're onto something,” he says. 

The dynamo simulations have come a 
long way in the past decade and now pre- 
dict the polar seed fields pretty well, Nandi 
says. He concedes that if overall solar activ- 
ity continues to ramp up far beyond predic- 
tions, scientists might have to reconsider 
whether polar fields are really the only 
thing driving the solar cycle. Perhaps, as 
MclIntosh’s observations suggest, the inter- 
action of lingering magnetic fields in the 
Sun’s interior leave a footprint on the next 
cycle. Nandi is now investigating that pos- 
sibility in his models. 

“Tf there’s one certainty in this field of 
prediction,” Nandi says, “it’s that we should 
be always ready to be proved wrong and go 
back to our drawing boards.” 


Zack Savitsky is a journalist in Reno, Nevada. 
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Possible misconduct in papers 
from Italian health minister 


Orazio Schillaci denies responsibility for duplicated images 
in eight papers he co-authored between 2018 and 2022 


By Michele Catanzaro 


n Italian newspaper has found du- 

plicated images in eight cancer pa- 

pers co-authored by the country’s 

minister of health, Orazio Schillaci. 

Schillaci, a physician with a Ph.D. in 

nuclear medicine, published the pa- 
pers between 2018 and 2022 while working 
in the faculty of medicine of the University 
of Rome Tor Vergata. 

The duplications, reported last week by Jd 
Manifesto, include cases in which the same 
image is presented as 
showing cells from 
different tissues or 
cancers and images 
supposedly represent- 
ing cells from differ- 
ent patients that are in 
reality the same image 
with a change of scale. 
Science has confirmed 
the evidence with im- 
age integrity experts. 

Schillaci did not 
respond to a request 
for comment. But at a 
public event on 14 Sep- 
tember, he shrugged 
off the reports. “I am 
not worried. I have 
not manipulated any- 
thing,” he said. “The images do not come 
from my laboratory, but from other col- 
leagues that have not done anything wrong.” 
In a statement to the Italian news agency 
ANSA, Schillaci’s co-author Manuel Scimeca, 
a researcher in Tor Vergata’s Department of 
Experimental Medicine, said the duplica- 
tions were the result of errors made while 
uploading images, and promised to inform 
the relevant journals after looking further 
into the issue. 

Image integrity experts say there’s no 
doubt about the duplications, though 
whether they were intentional is unclear. 
“It may be sloppiness in keeping track of 
each picture, or intentionality, because the 
pictures always fit the narrative [of the pa- 
per],” says Elisabeth Bik, a science integrity 
consultant who also identified another case 


The Italian minister of health publishes prolifically 
on nuclear medicine. 


of image duplication following the initial 
reports from JI Manifesto and Science. “In 
any case, this casts doubts on the accuracy 
of other experimental findings of this lab.” 

Schillaci joined Tor Vergata in 2001, be- 
coming dean of the faculty of medicine in 
2013 and rector of the university in 2019. 
He is a prolific author, with more than , 
400 papers registered in Scopus, a database 
of scientific literature. During the years in 
which the papers with duplications were 
published, he produced papers at a rate 
of one every 12 days; he has continued to 
publish since becom- 
ing health minister for ‘ 
Italy’s far-right-wing 
government in 2022. 

These prolific pub- 
lishing rates drew the * 
attention of Jl Mani- 
Jesto, a left-wing news- 
paper, which decided 
to check the quality of 
the minister’s work. In 
its investigation, the 
paper used _ software 
called ImageTwin to 
detect any evidence 
of picture  duplica- : 
tion within a sample 
of papers co-authored 
by Schillaci. 

In the analysis, 
which Science has seen, eight papers popped 
out as problematic. These include a 2021 
paper published in the Journal of Clinical . 
Medicine, which explored the potential of a 
radiographic technique for tracking the dis- 
tribution of prostate cancer drugs. An image 
said to show prostate cancer cells in mice is 
identical to animage ina 2019 study—also co- 
authored by Schillaci—that purports to 
show breast cancer cells. 

Another paper, published in 2019 in the 
International Journal of Molecular Sciences, 
looked at the development in breast cancer 
of cells that produce calcium deposits, like 
bone cells. A picture of the breast tissue cells 
in that paper is identical to one of actual 
bone cells in a different paper, published in 
2018 by one of Schillaci’s co-authors, on the 
effect of microgravity on bones. 


ry 
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In some cases, images are duplicated 
within a single paper. For instance, in a 
2018 paper, published in Contrast Media 
& Molecular Imaging, a picture of prostate 
cancer cells is labeled as coming from a pa- 
tient with bone metastasis, and is then dis- 
played again at a different scale and labeled 
as coming from a nonmetastatic patient. 

Both Bik and Jennifer Byrne, professor 
of molecular oncology at the University of 
Sydney, say the duplications are not the re- 
sult of legitimate experimental procedures. 

The duplications may well be inadver- 
tent, says Mike Rossner, president of the 
Image Data Integrity consultancy. “It is pos- 
sible that the author just grabbed the wrong 
file when preparing that figure panel,’ he 
says. But even if they are simple errors, 
Byrne says, “When one group seems to be 
making such errors repeatedly, this could 
indicate that their data-handling processes 
may have been flawed.” 

Given that so many papers are affected, 
the university should investigate, Bik says. 
“You absolutely need an independent 
panel,” agrees Daniele Fanelli, an expert 
in research integrity at the London School 
of Economics and Political Science. “What 
ought to happen in any reputable scientific 
institution is that an independent commit- 
tee without conflicts of interest [should] 
investigate and then issue corrections, or 
sanctions if necessary.” 

Tor Vergata did not respond to a request 
for comment. In a press release from the 
university, Rector Nathan Levialdi Ghiron 
defended the quality of the institution’s re- 
search and said the authors of the papers un- 
der scrutiny are now analyzing the original 
data to check that the conclusions hold up. 

It remains unclear who added the du- 
plicated images. Schillaci is listed as corre- 
sponding author on four of the papers under 
scrutiny, but three other researchers at Tor 
Vergata, including Scimeca, also appear 
on all eight publications. And according to 
declarations made in the papers, Schillaci’s 
contribution varied from coming up with the 
idea for the work, to correcting and review- 
ing it, to doing the research and writing. 

Still, the duplications raise the ques- 
tion of whether being a rector—or a cabi- 
net minister—is compatible with highly 
productive experimental activity. It’s the 
question the scientific community has 
been grappling with since the president of 
Stanford University, Marc Tessier-Lavigne, 
announced his resignation in July follow- 
ing an investigation into practices in his 
laboratory. “You cannot do two jobs and do 
them both well,’ Bik says. 


Michele Catanzaro is a journalist based in 
Barcelona, Spain. 
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Criticism builds against Ph.D. 
careers firm Cheeky Scientist 


Researchers say they felt pushed into signing away 
thousands of dollars for services they ultimately didn’t want 


By Catherine Offord 


hen Sara saw a LinkedIn ad earlier 
this year for a company promising 
to help science, technology, engi- 
neering, and math (STEM) Ph.D.s 
transition into lucrative industry 
careers, she thought she had noth- 
ing to lose by finding out more. With her 
postdoc coming to an end and her efforts to 
find a job and secure financial stability fall- 
ing short, she was feeling desperate, she says. 
So she agreed to an introductory video call 
with a “transition specialist” at Cheeky Sci- 
entist, which bills itself on its website as the 
“world’s premier career training platform for 
PhDs” and claims to have helped “thousands 
of PhDs” move from academia to industry. 
Sara describes what transpired as an 
aggressive sales pitch that played heavily 
on her anxiety about unemployment. The 
representative made her an offer: Cheeky 
Scientist’s Diamond Program, an online 
mentoring package, for a little over half 
what he said was the standard retail price of 
$9998—provided she sign up immediately. 
He had a solution for financing, too: a high- 
interest loan he could help her apply for 
through another company, there and then. 
Under pressure, Sara says, she signed up, 
but regretted it as soon as she was off the 
call. She contacted Cheeky Scientist within 
hours to request a cancellation. Now, she’s 
saddled with thousands of dollars of debt 
and is no closer to reclaiming her money— 
despite not using the company’s services. 
Sara is one of a growing number of STEM 
graduate students and postdocs sharing 
their stories about Cheeky Scientist online 
and with journalists, and even seeking legal 
help to get out of contracts they say they felt 
pushed into signing. Science spoke with and 
examined documentation from five custom- 
ers who committed between approximately 
$3000 and $8000 to join the company’s Di- 
amond Program in the past year or so and 
reviewed dozens of public complaints lodged 
with the Better Business Bureau (BBB). 
(Science is not using the customers’ real names 
as they spoke on the condition of anonymity 
because of concern for their careers and fear 
of retaliation from Cheeky Scientist over con- 


fidentiality and nondisparagement clauses 
included in some service agreements.) These 
accounts, along with pages of internet forum 
posts, describe similar themes: high-pressure 
sales tactics, swiftly signed contracts coupled 
with loans that could have annual percent- 
age rates of 20% or higher, and rejected re- 
fund and cancellation requests, regardless of 
whether services were used. 

The Florida-based company’s CEO 
and founder, Isaiah Hankel—a scientist- 
turned-salesperson, consultant, and career 
adviser—argues that dissatisfied customers 
are in the minority, and points to hundreds 
of positive testimonials on Cheeky Scientist’s 
website. The money-back rules are clearly 
explained online, he says, adding that the 
company is open to offering refunds when 
things don’t work out and has done so. “In 
some cases, there’s not a good fit.” 

There's a ready market for companies and 
individual consultants offering services to 
Ph.D.s in the United States, where just a frac- 
tion of the more than 40,000 STEM Ph.D.s 
who graduate each year will secure stable 
posts in academia. Many use pay-as-you-go 
schemes with rates of up to a few hundred 
dollars per hour. Cheeky Scientist, which 
Hankel founded in 2013, is a veteran in the 
field, having gained early popularity through 
free eBooks that careers experts say contain 
useful advice. It has also established a pres- 
ence at academic institutions across the U.S. 
through seminars given by Hankel, who 
completed his Ph.D. in anatomy and cell bio- 


rk 


logy in 2011 and has written articles about . 


STEM Ph.D. careers for publications includ- 
ing Nature, The Guardian, and the Harvard 
Business Review. 

From the mid-2010s, the company began 
to offer a “Scientist MBA,’ which until re- 
cently the company described as “guaran- 
teed to get you into management.” It now 
lists programs for more than 20 “career 
tracks” supported by a staff of 16 people as 
well as program leaders and board members, 
according to its website. 

The Diamond Program, introduced a few 
years ago, is described by the company as 
an “answer [to] the problems that are caus- 
ing suffering for PhDs in academia,’ and 
includes “private mentorship, live meet- 
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ings, a private members-only website, app 
and directory, and access to a network of 
tens of thousands of PhDs working in in- 
dustry.” Hankel declined to comment on the 
number of people enrolled in the Diamond 
Program, the company’s pricing, or the 
program’s content. 

Kim Petrie, assistant dean for biomedical 
career development at the Vanderbilt Uni- 
versity School of Medicine, calls the cost and 
payment arrangements “very troubling.” 
Her own interactions with Cheeky Scientist 
have centered on misleading content on the 
company’s website: A few years ago, Cheeky 
Scientist reused a figure from Vanderbilt de- 
scribing students’ post-Ph.D. employment, 
she says. The image had been modified to 
remove “all context ... labeling, the title ... 
everything that you would need to be able 
to actually interpret the figure.” The result 
falsely implied a large portion of the uni- 
versity’s Ph.D.s were stuck in 
endless postdocs. Cheeky Sci- 
entist removed the image after 
the university contacted the 
company, she adds. Hankel did 
not respond to questions about 
Petrie’s account. 

Jim Gould, director of the of- 
fice for postdoctoral fellows at 
the Harvard Medical School and 
Harvard School of Dental Medi- 
cine, has invited Hankel to give 
free careers talks to postdocs at 
the university and has partici- 
pated in Cheeky Scientist pod- 
casts. He says Hankel’s seminars 
are always well-received, but 
some postdocs have felt pres- 
sured by the company to pur- 
chase services. He advises researchers not 
to pay out-of-pocket when they can instead 
turn to institutions or alumni organizations 
for free career help. As for high-interest 
loans and multithousand-dollar payments, 
he says “That would be something on prin- 
ciple I wouldn’t recommend.” 

Customers featured on the company 
website say they’ve benefited from Cheeky 
Scientist’s paid-for services. Eda Machado, 
senior clinical project manager at IQVIA 
Biotech, tells Science she joined the Dia- 
mond Program in 2021 at the cost of several 
thousand dollars after more than a decade 
of postdoctoral and technical roles. She 
found an industry job within months. “The 
program is so well designed in terms of ma- 
terial, coaching,” she says. She rigorously at- 
tended classes and webinars, and made use 
of the members network. “It was very well 
worth the amount of money that I paid.” 

But other stories shared with Science, 
as well as accounts posted on internet 
forums—often anonymously—and filed with 
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BBB are more critical. In particular, many 
customers allege they were pushed into pur- 
chasing services they didn’t want, can’t af- 
ford, and haven’t used. 

Some of their criticisms focus on the tone 
of the company’s marketing, which they say 
taps into early-career researchers’ anxiet- 
ies about their professional futures. Typical 
emails and social media posts warn that 
“the longer you stay in academia after get- 
ting your PhD, the more you DAMAGE your 
career,’ and “Do you know what happens if 
you fail a job interview? You get blacklisted.” 
Advertising for one program aimed at inter- 
national researchers states that “most inter- 
national PhDs do NOT have the knowledge 
they need or the network and support they 
need to get hired in the U.S.,’ adding that “if 
you fail to land a job [quickly enough], you'll 
essentially be designated as an illegal worker 
and required to leave the U.S.” 


Ph.D. students and other early-career 
researchers from around the country 
also consistently tell Science they felt ha- 
rangued during what they thought were 
introductory meetings and were told that 
unless they signed up right away they’d 
miss their chance to join. (Hankel says 
the program has a limited capacity.) They 
say they were told they’d be able to claim 
their money back if they hadn’t found a 
job within 12 months—but discovered on 
reading the agreement’s small print after 
signing that this depended on Cheeky Sci- 
entist deciding they’d made sufficient “ef- 
fort” to participate in its programs. Most of 
the people who spoke with Science tried to 
cancel within hours of getting off the video 
call, but were reminded by company rep- 
resentatives that they’d verbally confirmed 
they understood the agreement and had 
signed up voluntarily. 

High-pressure sales tactics or not, the law 
favors the company in these situations, says 
Jeffrey Rachlinski, a professor at Cornell Law 


School. “Parties to a contract are bound to 
the language of agreements they sign, even 
if they did not read them,” he says, noting 
that researchers could have left the calls at 
any time. Nevertheless, he adds, if custom- 
ers “feel that some of what was represented 
to them was fraudulent or misleading, they 
could ... report this to the consumer fraud 
division of the state attorney general.” 

This is what Terry, a Ph.D. student, did. 
They'd signed up after Cheeky Scientist 
contacted them directly through LinkedIn 
and tried to cancel within 24 hours, but 
their request was refused. The state attor- 
ney general’s office took months to act, but 
when it eventually contacted Cheeky Scien- 
tist on their behalf the company refunded 
their money within weeks. Terry says they 
want to warn other students to be more 
cautious than they were. “I really want peo- 
ple not to go through this.” 

Others have reported suc- 
cessfully engaging their univer- 
sity’s legal services, or have filed 
complaints with BBB. Their 
arguments focus on alleged 
discrepancies between the way 
representatives described the 
agreement versus what the le- 
gal text said and on nondispar- 
agement clauses that prohibit 
customers from making “any 
statement, whether verbally 
or in writing, that would dis- 
parage [Cheeky Scientist] in 
any way.’ BBB states on its site 
that such nondisparagement 
clauses are “inconsistent with 
BBB’s Standards for Trust, and 
they are often illegal.” Hankel 
tells Science that Cheeky Scientist’s current 
agreements do not contain this language. 

Some have also complained directly to 
Affirm, the company that handles the loans. 
Affirm declined to say how many complaints 
it has received, but told Science in a state- 
ment that it recently initiated a “review of 
this merchant, after observing a heightened 
level of consumer complaints [and] decided 
to pursue a termination with this mer- 
chant.” Affirm added, “Any consumer who 
is experiencing issues with their outstand- 
ing loan should contact Affirm through our 
Help Center to initiate a dispute.” 

Researchers still trying to reclaim their 
money tell Science they feel embarrassed at 
their situation and are eager to move on. “I 
just want this to be behind me,” says Bruno, 
a Ph.D. student who signed up after a week- 
end of “absolute panic” about his lack of 
employment. He’d always thought he could 
“sniff out” hard sells, he says. “I guess this 
one just hit a particular weak spot at a par- 
ticularly weak time.” & 
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CRYPTOGRAPHY ally being able to factor 1000-digit numb (Pee 


Quantum algorithm offers faster 
way to hack internet encryption 


Scheme to factor giant numbers could be more efficient 
than 30-year-old Shor’s algorithm 


By Anna Kramer 


n 1994, Peter Shor created one of the first 

practical uses for a quantum computer: 

hacking the internet. Shor, an applied 

mathematician at the Massachusetts 

Institute of Technology (MIT), showed 

how a quantum computer could be ex- 
ponentially faster than a classical computer 
at finding the prime number factors of large 
numbers. Those primes are used 
as the secret keys that secure 
most of the encrypted informa- 
tion sent over the internet. 

For 30 years, Shor’s algo- 
rithm has endured as an exam- 
ple of the promise of quantum 
computers—although the de- 
vices are not yet big or reliable 
enough to implement it for large 
numbers. But now, a computer 
scientist has revealed a new 
quantum algorithm that might 
be better than Shor’s. In a pre- 
print first posted to the arXiv 
server on 12 August, Oded Regev 
of New York University proposes 
a scheme that could greatly re- 
duce the number of gates, or log- 
ical steps, needed to factor very 
large numbers. In principle, it 
could enable a smaller quantum 
computer to ferret out the secret 
encryption keys or a bigger ma- 
chine to decode them faster. “Is this actually 
going to have any effect?” Regev asks. “My 
feeling is that yes, it might have a chance.” 

Independent cryptographers who have 
evaluated the work are intrigued, too. Vinod 
Vaikuntanathan, a computer scientist at 
MIT, expects a packed house there next 
month when Regev will give a colloquium 
talk on his new algorithm. “In the world 
of quantum computing, essentially two or 
three new ideas have appeared so far in the 
last 30 years since Shor,’ Vaikuntanathan 
says. “You don’t see these new ideas ev- 
ery day, and that makes us hope.” Kenneth 
Brown, a quantum computing researcher 
at Duke University, agrees. “Because every- 
body has studied Shor’s algorithm for a long 
time, this result is surprising and super cool.” 
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Like all quantum algorithms, Shor’s algo- 
rithm relies on the mysterious properties of 
quantum bits, or qubits, which can be set to 
values of not only 0 and 1, but also a “super- 
position” of 0 and 1 at the same time. Small 
numbers of these qubits can be stitched 
together into gates, which carry out the 
logical operations of an algorithm. To fac- 
tor a number 7 bits long, Shor’s algorithm 
requires a quantum circuit of n? gates. 


Quantum computers may need millions or billions of qubits before 


they can hack internet encryption schemes. 


Most internet encryption now relies on 
numbers of at least 2048 bits, which equate 
to decimal numbers 617 digits long. Finding 
their prime factors with Shor’s algorithm 
would therefore require quantum computers 
with at least 4 million gates. But the biggest 
quantum computers to date only have a few 
hundred qubits. “None of them are anywhere 
near the size we need to factor numbers that 
we'd care about,” Brown says. 

Making things worse, environmental noise 
often destroys qubits’ delicate superposition 
states, ruining the operation. The noise can 
be addressed with error correction, but that 
requires even more qubits—millions or even 
billions of them, Vaikuntanathan says. “It re- 
ally blows up because of error correction,” he 
says. “That’s why we are pretty far from actu- 
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Improving error correction would help—vut 
so would improving on Shor’s algorithm. 

Regev saw a way to do that. Shor’s algo- 
rithm is 1D. It searches for the prime factors 
by raising a single number to high powers. 
Many big numbers must be multiplied to- 
gether before a result is reached. Regev re- 
alized he could multiply several numbers 
in different dimensions. The powers for 
any one number don’t get nearly as high. 
Although the two algorithms require about 
the same total number of multiplications, 
the multidimensional character of Regev’s 
means the multiplied numbers don’t get 
nearly as large before a result is reached. 

In the end, he found he would need only 
n° gates to factor an n-bit integer. It’s the 
first substantial improvement on Shor’s al- 
gorithm in 30 years, Vaikuntanathan says. . 
“Nobody has really succeeded beyond shav- 
ing off a little bit.” 

But Regev’s algorithm also 
comes with drawbacks, says 
Martin Ekera, a quantum com- 
puting researcher with the 
Swedish government whom 
Regev consulted with while try- 
ing to understand the practi- 
cal implications of his work. 
Its structure seems to require 
quantum memory to store inter- 
mediate values during the com- 
putation, and that means a need 
for more of those finicky qubits. 
“This drives up the cost of the 
algorithm,’ Ekera says. Regev 
acknowledges the concern about 
memory requirements, but says 
the algorithm still could end 
up having value—“maybe when 
memory is cheaper and we in- 
oe stead worry about the number 
of operations.” 

By the time quantum comput- 
ers are ready to find prime fac- 
tors by implementing either Regev or Shor’s 
algorithm, internet encryption may have . 
moved on. Federal agencies and security 
leaders are already shifting to alternatives, 
including so-called “lattice cryptography,” 
which would be immune to quantum hack- 
ing. Even so, algorithms like Regev and 
Shor’s could be applied retroactively, to de- 
crypt recorded traffic from the present and 
recent past, Ekera says. 

Regardless, Brown believes the sheer nov- 
elty of Regev’s work is likely to inspire and 
generate other new ideas in quantum crypto- 
graphy, which has struggled for significant 
breakthroughs. “I am myself trying to think 
about ways to push this further,’ he says. & 


Anna Kramer is a journalist in Washington, D.C. 
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GLOBAL HEALTH 


Polio eradication effort 
struggles with end game 


Tom Frieden, co-author of critical new report, explains 
the repeated failures of decadeslong campaign 


By Jon Cohen 


n 1988, a collection of health and non- 
profit organizations launched a vaccina- 
tion program that aimed to eradicate 
polio by 2000. More than 20 years later, 
the Global Polio Eradication Initia- 
tive (GPEI) has yet to cross the finish 
line. A new, sharply critical report issued on 
8 September by an independent monitoring 
board says GPEI is likely to miss yet more 
deadlines, and faults it for long embracing 
“a highly positive public ‘almost there’ nar- 
rative that was close to magical thinking.” 

Through massive vaccination campaigns 
GPEI has decreased cases of polio by 99%. 
Only Afghanistan and Pakistan still have 
cases of paralysis caused by wild polio- 
virus; seven were documented this year up to 
12 September. Yet paradoxically, the bigger 
threat today comes from the vaccines them- 
selves, which contain a weakened version 
of the virus that on rare occasions reverts 
to virulence. This year, circulating vaccine- 
derived poliovirus (CVDPV) has accounted 
for 246 cases of paralysis worldwide. 

The first polio vaccine, launched in 1955, 
was a shot containing inactivated polio- 
viruses, and although it protected people 
from paralytic disease, it did little to stop 
transmission. The weakened viruses in the 
oral polio vaccine (OPV), introduced in the 
1960s, block transmission and can spread 
through communities to protect unvacci- 
nated people. OPV is also less expensive and 
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easier to use than the inactivated poliovirus 
vaccine (IPV). But OPV’s reversion risk led 
many countries to return to IPV after they 
eliminated polio—or to use both. 

The eradication campaign originally re- 
lied on a “trivalent” OPV against the three 
poliovirus types. The majority of cVDPV 
was type 2, so GPEI in 2016 replaced it 
with one that just contained type 1 and type 
3 viruses. This dramatically reduced cases 
of cVDPV. The program then tried to stamp 
out occasional type 2 cVDPV outbreaks with 
a monovalent type 2 vaccine. In 2021, GPEI 
began to use a “novel” version that was less 
likely to revert to virulence. 

But the new report says GPEI will prob- 
ably fail to meet its current goal of stop- 
ping both wild type and vaccine-derived 
virus transmission by the end of this year 
and achieving full eradication by 2026. The 
report blames civil unrest, the COVID-19 
pandemic, and political instability and in- 
difference. It also says GPEI’s approach to 
cVDPV has been “marred by rigid attitudes, 
missed opportunities, lack of foresight, and 
an inability to adapt swiftly.” 

Science spoke with Tom Frieden, one of 
four members of the independent monitor- 
ing board that wrote the report. The former 
head of the U.S. Centers for Disease Control 
and Prevention (CDC), Frieden now runs a 
nonprofit, Resolve to Save Lives, that tar- 
gets epidemics and cardiovascular disease. 
This interview has been edited for brevity 
and clarity. 
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A vaccination campaign last month in Afghanista upd 


one of two countries that still has wild poliovirus. 


Q: The report is pretty stinging. You seem 
to also be criticizing GPEI for giving out the 
message, “Hey, we’re there,’ and then drop- 
ping their guard. 

A: Polio eradication has been way harder 
than anyone anticipated. And yet we 
shouldn’t lose sight of the fact that there are 
millions and millions of children who are 
now adults who are walking without a limp 
because of the eradication initiative. The big 
picture is we are closer than ever to eradica- 
tion. But that doesn’t mean we're close. 


Q: You emphasize that the initiative’s re- 
sponse to wild type cases has been far more 
vigorous than the response to outbreaks of _. 
vaccine-linked cases. 

A: I admit to having been part of this when 
I was at CDC. There was a long-standing 
precept that the wild type virus was the 
really important one, and once that was 
gotten rid of, it would be quite easy to mop 
up [and eradicate the virus]. The challenge 
is that with the switch to the bivalent vac- 
cine, there was the risk that lower baseline 
vaccination rate against vaccine-derived 
type 2 virus would enable outbreaks to 
occur. In places where there were low vac- 
cination rates, outbreaks occurred, as had 
been feared. 


Q: What are the solutions? 

A: Get higher vaccination rates. The only 
reason there’s really cVDPV outbreaks is 
that vaccination rates are very low. If you 
get a 50%, 60% vaccination rate, you don’t 
get the outbreaks. 

The technological fix is the novel OPV 
type vaccines that are more genetically 
stable. It’s a remarkable accomplishment 
that we have a safer vaccine for type 
2 poliovirus. Now, it’s probably going to 
have to get repeated for types 1 and 3. 


Q: Jonas Salk, who developed the first polio 
vaccine—which contained an inactivated 
virus—used to argue that we can never eradi- 
cate polio as long as we are using an OPV. 

A: At some point, you’re going to want to 
stop OPV vaccination, but that point is 

not for a long time. You can’t do it while 
you've still got spread, because IPV does 
not reduce transmission, it reduces disease. 
So I don’t think OPV failed, and IPV is 

the answer. They both have got a really 
important role. 


Q: One of the startling things the report 
highlighted was with the five wild type 
polio cases in east Afghanistan this year, 
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the children had received between 16 to 28 
doses of OPV. 

A: I’m not 100% sure what to make of it. I 
go back to 2014 when we saw something 
like this in West Bengal, where kids were 
getting 15 and 20 doses and getting polio. 

I really raked my staff over the coals. Their 
conclusion was these vaccines are be- 

ing given in communities that have such 
severe problems with water and sanitation 
that the kids have almost constant bacte- 
rial and viral infections. And so the efficacy 
of the vaccination is massively lower than 
it would be under optimal conditions. 


Q: What can be done? 

A: The independent monitoring board has 
been saying for a decade that these are not 
huge areas in Afghanistan and Pakistan, so 
could you get them water and sanitation? 
Even if that doesn’t end up mopping up 
their polio, it’s going to save a lot of lives. 
And it will convey to the community, hey, 
we care about not just polio, but you and 
your community. 


Q: What needs to happen now to avoid yet 
another report saying, “We’re going to miss 
the targets”? 

A: In the big picture, we have to get over 
the finish line in Pakistan and Afghanistan. 
That can be done by the usual kind of good 
technical and good political efforts, plus a 
real push on water and sanitation. And we 
have to get much better at mopping up the 
cVDPVs in Africa. 

What the report says pretty bluntly is, 
“You're spending so much of your money on 
your outbreak responses, you can’t do the 
outbreak prevention. So youre going to be 
chasing your tail all the time.” One of the 
ways to break that very negative vicious cy- 
cle is to identify the outbreaks faster. What’s 
called direct detection and nuclear pore 
sequencing can confirm polio cases within 
days rather than months. It’s exciting. Then 
you can vaccinate 10,000 or 100,000 people 
instead of 500,000 or 5 million people. 


Q: So you have some optimism despite the 
bleak report card? 

A: There used to be 365,000 kids para- 
lyzed every year. This is a huge success 
story. And there’s a risk that because it’s 
been so hard, it’s so far behind schedule 
and overbudget, that we’ll give up. But 

if we do it right, not only will the world 
eradicate polio, but it will also strengthen 
systems for routine vaccination, water, 
and sanitation, and it will give people 
confidence that public health can deliver. 
Eradication is the ultimate in both equity 
and sustainability, because it is for every- 
one and forever. 
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ARCHAEOLOGY 


Ancient logs may represent 
oldest wood construction 


Half-million-year-old structure unearthed in Zambia 
may have been a platform, bridge, or house foundation 


By Phie Jacobs 


very archaeological dig at Kalambo 
Falls is a race against time. The area 
near this giant waterfall, near Zam- 
bia’s border with Tanzania, is a rich 
archaeological site. But seasonal 
flooding forces researchers to quickly 
excavate before artifacts are washed away. 

In 2019, however, the river chose to re- 
veal rather than destroy. Lawrence Barham, 
an archaeologist at the University of Liver- 
pool, and his colleague Geoff Duller, a geo- 
chronologist at Aberystwyth University, 
had just descended a small cliff to a patch 
of beach beside the Kalambo River when 
they spotted the end of a carved digging 
stick protruding from the sandy riverbank. 
“Tt was kind of a wild moment,” Barham re- 
calls. The river had washed away the sedi- 
ment but, for the moment, left the stick. 

This week in Nature, the pair reports 
that an ancient wooden structure un- 
earthed from deeper in that riverbank may 
be the earliest known example of wooden 
construction. Using a newly developed lu- 
minescence dating technique, they deter- 
mined that human ancestors created the 
structure—a pair of interlocking logs joined 
by a carved notch—about 476,000 years ago, 
well before modern humans emerged. 

The discovery of what appears to be part 
of a fixed platform suggests our archaic 
relatives were less nomadic than previously 
thought, Barham says. The find also pro- 
vides a startlingly early date for “when peo- 
ple started to structurally alter the planet 
for their own benefit,’ Annemieke Milks, an 
archaeologist at the University of Reading, 
wrote in a related Nature commentary. 

Because wood decomposes more quickly 
than stone, wooden artifacts are relatively 
rare, even though “it’s likely wooden tools 
go a long way back,” says Andy Herries, a 
paleoanthropologist and geoarchaeologist 
at La Trobe University. A fragment of pol- 
ished plank from Israel is believed to be a 
whopping 780,000 years old, and a hand- 
ful of other ancient wooden artifacts have 
been discovered around the world, such as 
300,000-year-old spears from Germany. 

At Kalambo Falls, the wet conditions that 


make excavation so frustrating are ideal for 
preserving wood, Barham explains, because 
waterlogged sediment prevents decomposi- 
tion. Over time, the wood also absorbs min- 
erals dissolved in the water, making it even 
more durable. 

The meter-long logs found there had dis- 
tinctive marks on their surface that were 
likely intentionally made with stone tools. 
Their notched, interlocking design suggests 
they were part of a walkway—perhaps a 
bridge or the foundation of a dwelling. The 
people who built this structure, Barham and 
Duller explain, “were investing in a place.” 

The team determined the logs’ age with 
a technique called applied luminescence 
dating. When buried, some minerals can 
absorb ambient radioactivity and store it as 
energy. When scientists release this stored 
energy in a lab, the minerals glow with faint 
light. Its intensity is a measure of time since 
the sediment was last exposed to daylight. 
Luminescence dating normally relies on 
quartz, but for this study the researchers 
turned to feldspar, which can absorb much 
more radioactivity than quartz. “It’s got 
a bigger battery. It can take more charge,” 
Duller says. The feldspar grains from sedi- 
ment encasing the logs showed they were 
about 476,000 years old. 

“I think the ages are reliable,’ says 
Richard Roberts, a dating expert who re- 
viewed the paper. If so, the worked logs 
predate Homo sapiens, meaning they were 
likely created by a more archaic human an- 
cestor, perhaps H. heidelbergensis. 

The discovery comes as a surprise be- 
cause most archaeologists thought such 
early hominins were nomadic. Most arti- 
facts from their time were easily carried, 
such as spears and digging sticks. 

But early humans were apparently less 
nomadic and more capable than thought. 
“Our ancestors are so often portrayed as 
cavemen,” Herries says. In contrast, the new 
study suggests they also could build free- 
standing structures and shape the environ- 
ment to meet their needs. “We glimpse the 
archaeology of this time period as if through 
a pinhole,” Herries says, “but every now and 
again a find comes along that opens that 
pinhole a fraction more.” 
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RESEARCH FUNDING 


South Korea, a science spending 
champion, proposes cutbacks 


Sudden revamp would trim basic research spending while 
boosting biomedical innovation, space, and other fields 


By Dennis Normile 


outh Korea’s government surprised 

many of the nation’s scientists last 

month when it abruptly proposed 

cutting research spending by 10.9% 

in 2024 and shifting resources into 

a number of new initiatives, includ- 
ing efforts to build rockets, pursue high-risk 
biomedical research, and build a U.S.-style 
biotech innovation ecosystem similar to the 
one that has grown up around Boston. 

Officials said the dramatic restructuring 
—which could end a decadeslong surge 
in science spending that has helped make 
South Korea a global research force—was 
needed to help tame growing budget defi- 
cits and to focus resources on the most 
productive fields. One goal, the science min- 
istry said, is to create “innovative global top 
strategic research groups that can generate 
groundbreaking results.” 

Many researchers, however, are anxious 
about the plan, which the National Assem- 
bly is expected to approve by December. 
They say key budget details remain murky 
and that President Yoon Suk Yeol’s admin- 
istration has made little effort to consult 
with researchers. “Without any discussions 
with scientists, they just suddenly changed 
the whole [funding] system,” says Ji-Joon 
Song, a structural biologist at the Korea Ad- 
vanced Institute of Science and Technology 
(KAIST). “This is what makes scientists re- 
ally upset, not just cutting the budget.” 

In recent years, South Korea has been a 
rising star in science funding. Hefty govern- 
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ment outlays and robust private investment 
drove total support for R&D from about 
3.9% of gross domestic product (GDP) 
a decade ago to more than 4.9% in 2022. 
Only Israel, which spends 5.9% of GDP on 
R&D, ranked higher. (The United States 
spends 2.6%.) 

In late June, however, Yoon signaled a 
tightening of government support, though in- 
dustrial R&D spending is expected to remain 
strong. After ministries requested a modest 
increase in science spending, Yoon directed 
that research budgets be “overhauled starting 
from zero.” And he urged ministers to “boldly 
confront” what he called “predatory interest 
cartels” in the research community. 

That phrase “shocked” many researchers, 
says So Young Kim, a science policy special- 
ist at KAIST. “Everyone was wondering: ‘Am 
I part of a cartel?’” 

Yoon apparently was referring to pro- 
grams that give grants to institutes, small 
firms, and some academics without compet- 
itive review or much governmental control. 
Whether these groups are cartels “is quite 
debatable,” Kim says. 

On 28 August, South Korea’s State Coun- 
cil adopted the hastily revamped budget. It 
includes 25.9 trillion won ($19.5 billion) for 
science and engineering, according to the 
science ministry, and boosts spending for 
areas including artificial intelligence, semi- 
conductors, and space launch technologies. 
But funding for basic research drops by 
6.2%, and funding for national research in- 
stitutes, including KAIST and the Institute 
for Basic Science (IBS), drops by 9.4%. 
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South Korean President Yoon Suk Yeol’s budget wv upde 


promote space launch technologies. 


One big question is whether the 
institutes—which conduct most of the na- 
tion’s basic science—will have a say in allo- 
cating cuts. IBS said it was “inappropriate to 
answer [such] questions” while legislators 
are still reviewing the budget proposal. Song, 
who helps lead Basic Research United, which 
represents 30 science groups, fears some 
younger researchers will be hit hard by the 
revamp. For example, one program facing 
termination provides annual grants of about 
70 million won to virtually all academic re- 
searchers. The grants allow early-career re- 
searchers and those at local universities to 
establish track records so they can eventually 
qualify for more competitive awards, says mo- 
lecular biologist Jung-Shin Lee of Kangwon , 
National University. Ending the program, Lee 
says, “will ultimately have an adverse impact” 
on the whole research sector. 

Students are already taking note, says 
Dongheon Lee, an engineering Ph.D. candi- 
date at KAIST who leads the school’s gradu- 
ate student association. There is “a growing 
perception that [science and engineering] 
careers are less stable and less lucrative 
than [those in] other fields,’ Lee says. His 
group and five others have asked the gov- 
ernment to reconsider the cuts. 

South Korea’s health ministry, mean- 
while, announced two ambitious initia- 
tives. One is a “Korean version” of the 
U.S. Advanced Research Projects Agency 
for Health, which seeks to fund high-risk, 
high-reward biomedical studies. The plan 
calls for it to get 1.8 trillion won over 
10 years, depending on future appropria- 
tions. The new agency aims to reverse the 
“not so impressive” outcomes of South Ko- 
rea’s biomedical research, and to embrace 
“a tolerance of failure and the ability to 
recover from failure,’ says Kyung Sun, a 
former surgeon and adviser at Kyung Hee 
University who has promoted the idea. Pri- 
ority targets include accelerating vaccine . 
development and reducing cancer rates. 

The second initiative, called the Boston- 
Korea project, would get 60.5 billion won 
in 2024 to forge links between Korean and 
Boston-area research institutions. On a visit 
to the Massachusetts Institute of Technol- 
ogy in April, Yoon said Boston’s biotech clus- 
ters could serve as a model for South Korea. 
Song, who was a research fellow at Harvard 
Medical School, endorses the goal. But, “We 
cannot make a huge biotech cluster out of 
the blue in a short amount of time,” he cau- 
tions. “We need a long-term plan.” | 


With reporting by Ahn Mi-Young in Seoul, 
South Korea. 
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SIDELINED 


Svetlana Mojsov helped discover the hormone GLP-1, paving the way for 
blockbuster obesity drugs. Now, she’s fighting for recognition 


hen Svetlana Mojsov heard 

the spring 2021 announce- 

ment, she was startled. The 

Canada Gairdner Interna- 

tional Award, a prestigious 

biomedical research prize, 

would be bestowed on three 

scientists for work underpin- 

ning the diabetes and obesity 

drugs that have exploded in popularity in 
recent years. “I was really upset,’ recalls 
Mojsov, a chemist at Rockefeller University. 
The Gairdner award marked the third 
time in 4 years that the same trio of 
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By Jennifer Couzin-Frankel 


scientists—Joel Habener at Massachusetts 
General Hospital (MGH), Daniel Drucker 
at the University of Toronto, and Jens Juul 
Holst at the University of Copenhagen—were 
honored for work that began in the 1970s 
and ’80s on glucagon-like peptides. That 
research identified glucagon-like peptide-1 
(GLP-1) as a hormone churned out by gut tis- 
sue that triggers insulin release in the pan- 
creas. Drugs mimicking GLP-1, called GLP-1 
agonists, would later become blockbusters, 
raking in billions of dollars and transform- 


ing treatment of diabetes and obesity. They 
are the first that appear to safely and con- 
sistently cause marked weight loss, and last 
month they were reported to stave off heart 
disease associated with obesity. Millions of 
people take them, and speculation is stirring 
about an eventual Nobel Prize. 

But this upbeat narrative of scientific dis- 
covery is missing an important piece: Mojsov 
herself. She, too, was at MGH in the 1980s 
and published early key papers on GLP-1. 
She later fought, successfully, to be added to 
crucial patents that initially omitted her as 
a co-inventor. 
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And yet Mojsov’s career followed a differ- 
ent course from those of the three men. She 
would never run her own lab or secure major, 
consistent funding. She would publish far less 
often. Married to a star immunologist, she es- 
chewed the unswerving pursuit of her own 
research in favor of balance, often helping 
younger colleagues advance their work while 
remaining out of the scientific limelight her- 
self. Those choices may have come at a cost. 

“There’s a hard line being drawn between 
these three gentlemen and Svetlana when it 
sure seems that there is a hell of a lot of over- 
lap” in their early discoveries, says Richard 
DiMarchi, a peptide chemist at Indiana Uni- 
versity Bloomington who has helped develop 
diabetes drugs. Though her subsequent 
prominence “was far less ... why should that 
have any relevance for who gets credit?” 

Consistently described by friends as “not 
a self-promoter,’ Mojsov is now fighting to 
amend the record, urged on by some support- 
ers. A chemist friend submitted a request for 
a correction to The New York Times after a 
lengthy piece on GLP-1 research last month 
didn’t mention her. She reached out to Nature 
to protest her omission from a January ar- 
ticle, and says the journal will be publishing 
a correction. Another correction appeared in 
Cell after she voiced objections to a 2021 es- 
say on GLP-1 that sidelined her. 

Mojsov’s story raises thorny questions 
about the scientific enterprise, including how 
credit is apportioned and how award deci- 
sions are made. Several in the obesity and 
diabetes field—including those being cel- 
ebrated for GLP-l—express unease with her 
near-total absence from the narrative. But 
few have come forward to advocate for her 
place in that story. 

“T think it’s a question of integrity” for 
people to speak up, says Mojsov, now in her 
early 70s. “I still don’t understand how I 
was excluded.” 


MOJSOV CAME to Rockefeller’s graduate pro- 
gram in 1972 from Belgrade, in what was then 
Yugoslavia, where she got her undergraduate 
degree in chemistry. She recalls her room— 
a single—in a Rockefeller dorm as luxurious. 
“You look out on the garden, it’s green, very 
beautiful,’ she says. 

Mojsov was drawn to the lab of Bruce 
Merrifield, a renowned chemist who later 
won a Nobel Prize for his efficient method of 
synthesizing bits of protein called peptides. 
In Merrifield’s lab she focused on glucagon, 
a hormone released by the pancreas that acts 
as a check on insulin. Whereas insulin low- 
ers blood glucose, glucagon raises it, and sci- 
entists thought suppressing glucagon might 
help treat type 2 diabetes. Testing that idea 
required a steady supply of glucagon, and 
others had struggled to synthesize it by the 
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method that Merrifield had pioneered. “Peo- 
ple said it can’t be done,’ says chemist George 
Barany, who was also in the Merrifield lab 
and is now at the University of Minnesota. 
“Svetlana got it to work.” 

Barany and Mojsov shared an office and 
struck up a friendship that has endured 
for 50 years. “She’s just so kind and hum- 
ble and curious,” Barany says. He helped 
Mojsov with her English as she wrote her 
dissertation and the two became scientific 
“confidantes,” he says. Both loved opera 
and ballet and sometimes ran into each 
other at performances. 

During graduate school Mojsov also met 
her husband-to-be, immunologist Michel 
Nussenzweig, then immersed in medical 
school at New York University and a Ph.D. pro- 
gram at Rockefeller. He would bring her cups 
of tea to alleviate the stress of her dissertation 
writing, Mojsov recalls. As Nussenzweig’s 
training ground on, she mastered glucagon 
synthesis and stayed in Merrifield’s lab as a 
postdoc to refine her techniques. 


“| still don’t understand 


how | was excluded.” 


Svetlana Mojsov, Rockefeller University 


In the early 1980s, Nussenzweig was of- 
fered a medical residency at MGH. Mojsov 
was recruited to join the endocrine unit there 
as an instructor. She also became head of a 
new facility that would synthesize peptides for 
the unit’s scientists. Filling these orders “was 
not really a very time-consuming job,” Mojsov 
says, allowing her to pursue her own research. 
She was given one lab bench and could af- 
ford just a single technician, but Mojsov 
knew what she wanted to study: a still- 
mysterious peptide called GLP-1. 

Initial details about it had emerged from 
the lab of a rising-star endocrinologist: 
Habener. He and his team were studying key 
hormones, including glucagon, in pancreases 
from anglerfish, which they pulled from Bos- 
ton harbor. They froze the fish’s hormone- 
producing pancreatic islet cells in search of 
uncharted DNA inside, ultimately cloning a 
gene called proglucagon. In 1982, just before 
Mojsov arrived at MGH, they reported the 
fish gene encoded a large precursor protein 
that the body chops apart to form glucagon. 
Also embedded in the proglucagon protein 
was a stretch of amino acids that resembled 
glucagon and would come to be called GLP-1. 
Subsequent looks at the amino acid sequence 
of proglucagon in mammals, including work 
in hamsters and humans by Graeme Bell 


at Chiron Corporation, revealed a second 
glucagon-like peptide, GLP-2. 

GLP-1’s amino acid sequence also shared 
some features with gastric inhibitory peptide, 
or GIP, which was then the only known mem- 
ber of a fabled category of hormones called 
incretins. Incretins are produced by the gut 
and spur the pancreas to release insulin—a 
function scientists thought could make them 
useful for studying and even treating type 2 
diabetes. But GIP was a disappointment. Giv- 
ing it to people with diabetes had had little 
effect on their insulin levels. “GIP was a com- 
plete bust,’ Habener says. 

Both he and Mojsov wondered whether 
GLP-1 would turn out to be different. One 
step toward finding out was to discern where 
in the body the active form of the peptide was 
made; this is the portion that, when cleaved 
from the parent protein, would become bio- 
logically active. In her small office, Mojsov 
scrutinized the string of 37 amino acids 
making up the mammalian GLP-1 sequence. 
Based on its similarity to glucagon, and the 
way biologically active glucagon is produced, 
she hypothesized that a stretch of 31 amino 
acids from spots 7 to 37 within the larger 
GLP-1 peptide might be an incretin. On a 
sheet of paper printed with the proglucagon 
amino acid sequence, which Mojsov still has, 
she jotted down what the functional portion 
of GLP-1 would be. (Years later, she would 
explain her scientific rationale in a paper.) 
“Then,” she says, “I went to prove it.” 

To see whether the 7-37 fragment was 
present in the intestines, as you’d expect of 
an incretin, Mojsov needed to fish for it with 
antibodies. The peptide itself could appear 
in minute quantities, but an antibody would 
more clearly flag its presence. 

Making antibodies was a laborious task. 
Mojsov first cranked out GLP-1 in abun- 
dance, storing it in glass vials. Then she in- 
jected rabbits with different segments of her 
labmade peptide and waited 2 months for 
the antibodies to proliferate in their blood. 
She collected blood from a neck artery, an 


experience that taught her how much she . 


disliked working with lab animals. “I’d just 
go home and take a shower” after that, she 
says. Mojsov then isolated the antibodies, 
another painstaking process. 

Two floors below, Habener and his team 
were beginning to probe the biology of GLP-1. 
In 1984, the lab brought on anew postdoctoral 
fellow, Drucker, who set out to identify which 
cell types could produce the peptide. An 
endocrinologist who'd never worked in a lab, 
Drucker says he quickly found there was little 
hand holding in this one. “You 100% had to 
show initiative or you would flounder.” 

Mojsov was never part of Habener’s lab. 
But she says he knew of her work, and in 
the summer of 1984 Drucker approached 
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Foundations of a blockbuster 


In the 1980s, Svetlana Mojsov belonged to a small group of scientists who helped decipher the function of the glucagon-like peptide-1 (GLP-1) hormone. 
In the years that followed, researchers and drug companies built on those findings to develop blockbuster obesity and diabetes drugs. 


Svetlana Mojsov 
arrives at Rockefeller 
University for 
graduate school. 


Mojsov arrives at 
Massachusetts 

General Hospital 
and launches her 


Mojsov, Habener, and 
colleagues report finding GLP-1 
in the intestines; at about the 
same time, Jens Juul Holst in 


Mojsov and Habener, with 
Gordon Weir, report that GLP-1 
induces insulin in a whole 
rat pancreas; Holst has a similar 


David Nathan and With help from Mojsov 
Habener, with GLP-1 attorneys, Mojsov is added 
synthesized by challenges the to four 
Mojsov, publish on patents, seeking patents as a 
GLP-1 in humans. co-inventorship. co-inventor. 
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GLP-1 research. Copenhagen reports the same. report in the pig pancreas. 
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Joel Habener Habener's postdoc, Two patents on Ateam in the The first Novo 
and his lab Daniel Drucker, GLP-1 are issued, United Kingdom Nordisk GLP-1 drug, 
members report reports that GLP-1 with Habener reports that liraglutide (Victoza), 
a glucagon-like induces insulin in listed as the sole in rats, GLP-1 is approved in 
sequence in fish. rat pancreas cells. inventor. More can cause the United States 
will follow. appetite loss. for diabetes. 


her, at Habener’s suggestion, to discuss a 
collaboration. Mojsov explained she'd al- 
ready produced antibodies to different 
stretches of GLP-1 and developed ways to de- 
tect its presence. She and some members of 
Habener’s lab joined forces to employ her de- 
tection methods, tracking different stretches 
of GLP-1 peptide in various rat tissues. Work- 
ing largely alone, Mojsov says she found the 
7-37 stretch of GLP-1, which she hypothesized 
was the active form, in rat intestines. 

In 1986, Mojsov and Habener published a 
joint paper, which detailed the discovery of 
7-37 in the gut. The paper appeared in The 
Journal of Biological Chemistry and is now 
widely considered a landmark in the field. 
It listed Mojsov first and Habener last, the 
spot for a senior author. 

The next question was whether the 
7-37 form of GLP-1 in the gut was bio- 
logically active—in particular, whether it 
could trigger insulin release in the pan- 
creas. Using GLP-1 synthesized by Mojsov, 
Drucker led a study showing that GLP-1 did 
indeed prompt insulin secretion in a line of 
rat pancreatic islet cells. To test its effects in 
a whole organ, Habener contacted a friend, 
endocrinologist Gordon Weir, who had de- 
veloped a rat pancreas model—kept oxygen- 
ated and at body temperature in a Plexiglas 
box, so researchers could measure its insu- 
lin secretion minute by minute. 

When Weir injected the rat pancreas with 
GLP-1 that Mojsov had synthesized, insulin 
output ticked up. “We kept putting in less 
and less” peptide, Weir says, and he was as- 
tonished that even tiny doses had an effect. 
Mojsov measured out how much GLP-1 was 
infused, to confirm the peptide aligned with 
the insulin response. The two hormones “go 
exactly up in parallel” she says. “It was a 


The resulting paper, published in 1987 in 
The Journal of Clinical Investigation, listed 
just the three scientists: Mojsov was first, 
Weir second, and Habener last. It’s “prob- 
ably the most important paper I’m associated 
with,” Weir says. 

For Habener, Drucker’s work combined 
with the rat pancreas study was “a good dou- 
ble whammy,’ confirming that GLP-1 was one 
of the long-sought incretins. 

Now in his late 80s, Habener remembers 
Mojsov as an important collaborator. “She 
was involved in the beginning, pioneering 
work,” he says, “deciphering what the real 
active GLP-1 peptide is.” And Mojsov’s abil- 
ity to quickly and accurately synthesize 
large batches of peptide “put us a leg up” 
on some fierce competition. In Copenhagen, 
Holst and his colleagues were on the same 
trail, publishing on GLP-1’s active form and 
insulin-secretion powers at almost exactly 
the same time. 

The MGH group was first to report test- 
ing GLP-1 in people. Habener teamed up 
with a young MGH diabetes specialist, 
David Nathan, who infused the peptide 
into healthy people and those with diabe- 
tes. GLP-1 prompted insulin release when 
glucose levels rose—after eating, for exam- 
ple. Nathan sees both Mojsov and himself 
as a “supporting cast,’ whereas Habener 
was “the first one with the light bulb that 
went on,” and who deciphered the signifi- 
cance of GLP-1. 

By the time the Nathan paper appeared, 
in Diabetes Care in 1992, Mojsov had 
settled back in New York City. She and 
Nussenzweig had moved 2 years earlier, 
after he received an alluring job offer from 
Rockefeller. “It was time to leave” MGH, 
says Mojsov, who recalls her excitement for 


beautiful experiment.” 
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Liraglutide Habener, Drucker, GLP-1 drug 
becomes and Holst win the sales are 

the first Canada Gairdner estimated 
GLP-1 drug Award, the third at $22 
approved for award in 4 years billion. 
weight loss. shared by the trio. . 


AT ROCKEFELLER, she joined the lab of 
immunologist and future Nobel laureate 
Ralph Steinman, initially as an assistant pro- 
fessor. Mojsov had a toddler and an infant, 
and “that takes time away,’ she says. “One 
has to balance with young children the career 
you're about to start.” 

She shifted to studying GLP-1 biology in 
fish, supported by a grant from the National * 
Science Foundation and collaborating with 
scientists working on fish glucose metabo- 
lism. She also offered lab members help with 
peptide biology, finding that mentoring and 
collaborating with junior women scientists 
brought particular fulfillment. 

“I was a postdoc but she was committed 
to my work,’ says Leonia Bozzacco, now an 
immunologist and infectious disease scien- 
tist at Regeneron. Mojsov helped another lab : 
member, Sayuri Yamazaki, now at Nagoya 
City University, practice her talk for a promo- 
tion at Rockefeller and for job interviews in 
Japan and Singapore. 

Ultimately, Mojsov found a scientific 
home in Steinman’s lab; she stayed for 
more than 20 years, until his death in 2011. 
“T always thought I would move on but 
didn’t,” she says. 

Meanwhile, work on GLP-1’s effects in hu- 
mans was moving forward, with other inves- 
tigators at the fore. Studies in the 1990s led 
by Holst and Michael Nauck, an endocrino- 
logist now at Ruhr University Bochum, found 
that unlike GIP, GLP-1 could normalize blood 
sugar levels in people with diabetes. Other 
studies reported that in rats, GLP-1 caused 
appetite loss, an early hint that the peptide 
might target obesity. At drug companies, 
scientists labored to overcome the peptide’s 
short life span before it’s degraded by the 
body, eventually finding mimics that were 
more practical as drugs. 
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Mojsov took great pride in her GLP-1 work at 
MGH and wondered whether patent applica- 
tions had been filed. Her inquiries to Habener 
went unanswered, she says. (Habener does 
not recall this communication.) In 1996, she 
mentioned her curiosity to an employee at a 
biotechnology company, who told her patents 
had been granted years prior. 

Mojsov quickly found two 1992 patents on 
a “fragment” and “derivatives” of GLP-1 that 
had the ability to prompt insulin secretion. 
A third patent would be awarded in 1997. 
All listed Habener as the sole inventor. “I 
was shocked,’ Mojsov says. She ultimately 
engaged a law firm to help her fight for co- 
inventor credit. 

Most patent practitioners consider inven- 
torship “one of the more gray areas in pat- 
enting,” says patent attorney Michael Davis, 
who assisted with Mojsov’s case for several 
years while at a boutique law firm, Klauber 
& Jackson. 

The law requires making a “not in- 
significant” contribution to the “con- 
ception of the claimed invention,” 
rather than simply carrying out ex- 
periments. Disagreement over which 
contributions meet this threshold can 
spur inventorship squabbles, patent 
experts Say. 

The fight with MGH’s patent office 
dragged on for years, long after the 
patents were licensed to drug com- 
pany Novo Nordisk and it was deep in 
its GLP-1 drug development program. Even- 
tually, between 2004 and 2006, MGH agreed 
to amend four patents to include Mojsov (a 
fourth had been issued, to Habener alone, 
in 2005), and the United States Patent and 
Trademark Office accepted these changes in 
inventorship. A fifth patent was awarded in 
2006 to both scientists. 

Mojsov says MGH agreed to award her one- 
third of drug royalties, with Habener getting 
the rest. In 2010, U.S. regulators approved the 
first Novo Nordisk GLP-1 agonist drug, called 
liraglutide and sold under the brand name 
Victoza, for diabetes. Royalties flowed in—but 
stopped after little more than a year, when the 
first patent expired, Mojsov says. She declined 
to name the payout, but says that “for an aca- 
demic it was still nice, no complaints.” Holst 
and Drucker say they never benefited finan- 
cially from GLP-1 agonist drugs. 

Drucker sometimes wonders whether he, 
too, should have pursued co-inventorship. 
“Tt’s my data and Svetlana’s data” in the 
patents, he says. “I’ve often said to myself, 
‘Maybe you made a mistake, Drucker.” Still, 
gratitude to Habener for helping shape his 
career led him to decide “it wasn’t materially 
important to me” to be included. 

Habener says he also has regrets. “I 
didn’t think it would matter that much” 
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who was listed, he says. “In retrospect, it 
would have been much better to have all 
these people” who contributed on the pat- 
ents. He says he remembers agreeing to 
include Mojsov when she first requested it, 
and told Science he did not recall the multi- 
year patent fight that ensued. 

The patent dispute was exhausting, 
Mojsov says now. At times, her research 
stalled. After the first drug came out, she 
told herself, “I’m going to put all of this 
behind me.” 


BUT THE BOOK on the GLP-1 saga kept falling 
open. Mojsov continued her work at Rock- 
efeller, now as a research associate professor; 
she did not oversee a lab but collaborated 
with various scientists. And she watched 
GLP-1 agonists gallop ahead. New versions 
were approved for diabetes and interest in 
using them for obesity exploded. Then in 


“There’s a hard line being drawn 
between these three gentlemen and 


Svetlana when it sure seems 


that there is a hell of a lot of overlap.” 


Richard DiMarchi, Indiana University Bloomington 


2017, Habener, Drucker, and Holst jointly 
won the Harrington Prize for Innovation in 
Medicine, given by the American Society for 
Clinical Investigation and the Harrington 
Discovery Institute, “for their discovery of 
incretin hormones and for the translation of 
these findings into transformative therapies.” 
In 2020 came the Warren Alpert Foundation 
Prize from Harvard Medical School, and in 
2021 the Gairdner award. 

“Awards are the big way” credit is given 
in science, says Jeffrey Flier, an obesity 
and diabetes researcher and former dean 
of Harvard Medical School. Flier was on 
a subcommittee for the 2020 Alpert prize 
and helped craft nominations for Drucker, 
Habener, and Holst. He reviewed the litera- 
ture, and because he was less familiar with 
Holst, he spent hours speaking with people 
who knew the scientist’s work. Flier doesn’t 
recall whether Mojsov’s name came up in 
the discussions. Regardless, he says he does 
not view her “as the discoverer,’ but rather 
someone contributing insights into peptide 
biology as part of MGH’s larger GLP-1 effort, 
in the group led by Habener. 

Flier and others note that award commit- 
tees typically focus on scientists nominated by 
institutions and colleagues. Without a promi- 
nent post and an enduring voice in GLP-1 


research, Mojsov may have been at a dis- 
advantage. Drucker agrees. “From my point of 
view Svetlana did very important work in the 
field, no one should discount that,’ he says, 
adding that he cites her papers and names 
her on slides showing “the community that 
built this story.’ At the same time, “it’s easier 
to get credit if you then have built the field for 
decades. You have greater visibility,’ he says. 

To DiMarchi, such arguments are under- 
standable, but shouldn’t matter if a specific 
discovery is being lauded. The patents, he 
says, are key: They indicate that for GLP-1 
“there were only two inventors, that’s Joel 
Habener and Svetlana.” 

Mojsov, an intensely private person, 
told almost no one about the GLP-1 saga 
until recently—not her old friend Barany 
or her former mentee and now friend 
Bozzacco. She declines to engage her hus- 
band, a powerful scientific figure, on her 
behalf. “We have separate careers, 
separate identities,” she says. “I al- 
ways felt that it was completely 
inappropriate for him to try and ad- 
vertise in a field which was not his.” 

Barany and his brother, chem- 
ist Francis Barany at Weill Cor- 
nell Medicine, are now supporting 
Mojsov in speaking up, along with 
some colleagues at Rockefeller. 
George Barany submitted the correc- 
tion request about her absence from 
last month’s New York Times piece. 
“This is a story that has happened over and 
over again in science,” Francis Barany says. In 
Mojsov’s case, “There are no villains; he 
adds. “You don’t have to say that somebody 
hogged credit,” but rather that she isn’t get- 
ting the recognition she deserves. 

All three principals—Habener, Drucker, 
and Holst—attest to Mojsov’s vital contri- 
butions. “I’m on Svetlana’s side, I really feel 
sympathetic,” Habener says. “I wish there was 
something I could do.” 

These comments crack Mojsov’s typically 
unruffled exterior. “Of course they will say I 
deserve more recognition, but then they take . 
the recognition that belongs to me,” she says. 
“They assign it to themselves.” 

Bozzacco recalls that for years, the only 
clue to Mojsov’s connection to GLP-1 agonist 
drugs was a yellow pen on her desk, printed 
with the name of a Novo Nordisk diabetes 
drug. Because of the pen, “I had always as- 
sociated Svetlana with that drug,’ without 
knowing the story behind it, she says. In 
some ways she is not surprised that Mojsov 
has been largely quiet about her critical role. 
“T can hear her say ‘It’s fine, let’s move on,” 
she says. “Her attitude is always pragmatic.” 
But Bozzacco also understands why Mojsov 
is fighting now. “If everybody else is getting 
recognition, why not her?” & 
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Predicting pathogenic protein variants 


Machine-learning algorithm uses structure prediction to spot disease-causing mutations 


By Joseph A. Marsh! and Sarah A. Teichmann?? 


any of the genetic mutations that 

cause disease in humans occur in 

protein-coding regions. Although 

the capacity to sequence DNA and 

identify these variants has sub- 

stantially increased, the ability 
to interpret their effects remains limited. 
This problem is particularly acute for mis- 
sense variants, which involve substitution 
of a single amino acid residue and make up 
the overwhelming majority of “variants of 
uncertain significance” (VUS), as classified 
by clinicians (7). On page 1303 of this issue, 
Cheng et al. (2) present AlphaMissense, a 
variant effect predictor (VEP) machine- 
learning algorithm that builds on the 
AlphaFold methodology for predicting pro- 
tein structures from gene sequences (3). 
The authors demonstrate superior perfor- 
mance by AlphaMissense across multiple 
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benchmarks compared with that of VEPs 
that are available now, which is likely to im- 
prove the interpretation of sequencing data 
and advance the role of computational pre- 
dictions in the diagnosis of genetic disease. 

Recent years have brought exciting de- 
velopments in experimental approaches for 
variant characterization using multiplexed 
assays of variant effect (MAVEs), which are 
the focus of the Atlas of Variant Effects proj- 
ect (4). A single MAVE experiment can en- 
able the direct measurement of the effects 
of tens of thousands of genetic variants by 
using in vitro assays of cellular fitness or dif- 
ferent molecular phenotypes, with genotype 
linked to phenotype through high-through- 
put DNA sequencing. Although MAVEs have 
shown promise for interpretation of patho- 
genic mutations in certain genes (5), at pres- 
ent, experimentally derived variant effect 
maps are only available for a tiny fraction of 
the human genome. Thus, the use of compu- 
tational VEPs for generating in silico variant 
effect maps remains essential. 

To create AlphaMissense, Cheng e¢ al. 
adapted AlphaFold2 (3), which has revolu- 
tionized the computational prediction of 
protein structures, to the problem of variant 


effect prediction. They combined structure 
prediction with other strategies that have 
proven successful in variant effect prediction, 
specifically, protein language modeling (6) 
and fine-tuning against allele frequencies in 
human and nonhuman primate populations 
(7). The outputs of AlphaMissense are patho- 
genicity scores, which reflect the likelihoods 
of mutations to cause disease, rather than 
predicted changes in protein structures. 

The use of protein structural information 
within AlphaMissense makes it an interest- . 
ing contrast to many other state-of-the-art 
VEPs. Although structures can be useful for 
predicting biophysical properties such as 
protein stability and interactions, they have 
generally been of little benefit for clinical 
variant effect prediction. For instance, the 
top-performing VEPs from a recent bench- 
marking study (5) were evolutionary model 
of variant effect (EVE), which uses multiple 
sequence alignments (8), and ESM-lv, a pro- 
tein language model that is trained on mil- 
lions of unaligned amino acid sequences (6). 
Importantly, Cheng e¢ al. show that inclusion 
of the AlphaFold structure prediction stage 
is required for achieving highly accurate 
predictions. This, in combination with the 
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strong performance recently reported by an- 
other structure-based VEP, PrimateAI-3D (7), 
suggests the beginning of a new phase in var- 
iant effect prediction in which incorporation 
of protein structure models, with other fea- 
tures (sequence alignments, allele frequen- 
cies, MAVE data), or language models may 
become essential to remain competitive. 
One limitation of AlphaMissense is that 
the structural component of the predic- 
tor does not, at present, account for most 
proteins being assembled into complexes 
or condensates with diverse quaternary 
structures. Mutations in proteins that form 
complexes can cause disease in ways that 
will not be obvious when considering only 
monomeric structures. For example, such 
mutations might disrupt protein interfaces 
or have more subtle, long-range allosteric 
effects (9). Interactions can also differ from 
one cell type or state to another through 
variations in protein abundance; the vast 
number of cell types and states that exist 
are only now becoming apparent through 
the Human Cell Atlas project. Moreover, al- 
though many disease-associated mutations 
cause a loss of function through protein 
destabilization or disruption of complex 
assembly, in other cases, mutant proteins 
cause disease through dominant-negative 
or gain-of-function effects (70). Thus, it will 
be interesting to see how AlphaMissense 
performs on non-loss-of-function variants, 
which tend to involve less-disruptive amino 
acid substitutions and are predicted poorly 
by nearly all previously tested VEPs (JO). 
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Ultimately, incorporating quaternary struc- 
ture, which could be facilitated by algorithms 
that can predict protein complex structures 
(11), may lead to greater improvements in 
variant effect prediction. 

A major challenge when new computa- 
tional predictors are released lies in judg- 
ing their performance. Self-assessment by 
a method’s creators tends to be unreliable, 
with nearly all new VEPs reporting them- 
selves to be better than all those against 
which they have been tested (12). Despite 
this, the approach of Cheng et al., which in- 
volved comparing AlphaMissense to many 
existing VEPs using a variety of orthogo- 
nal assessment strategies, is encouraging. 
AlphaMissense has been trained on human 
allele frequencies, which potentially makes 
it somewhat susceptible to inflated perfor- 
mance, when assessed on traditional bench- 
marks of distinguishing pathogenic from 
benign variants, compared with more unsu- 
pervised VEPs (13). However, the remarkable 
results when testing AlphaMissense against 
independent data from MAVEs suggests that 
its performance is unlikely to be strongly 
influenced by training bias (5). The authors 
have made AlphaMissense predictions across 
the entire human proteome, including for al- 
ternate isoforms, freely available as a com- 
munity resource, which will promote rapid 
independent evaluation and incorporation 
into sequence analysis pipelines. 

Despite the increasing number of meth- 
ods that are available for predicting mis- 
sense variant effects, the clinical impact of 


AlphaMissense predicts the effects of variants 
by building on the AlphaFold algorithm that predicts 
protein structures from gene sequences. 


VEPs remains limited. At present, the extent 
to which computational predictions can be 
used in genetic diagnosis is minimal, pro- 
viding only “supporting” evidence, accord- 
ing to recommendations by the American 
College of Medical Genetics and Genomics 
and the Association for Molecular Pathology 
(14). Recent work seeks to improve this by 
providing more-quantitative interpretations 
of computational variant effect scores, an 
enabling calibration with different evidence 
strengths (75). Yet it remains challenging to 
understand how much computational pre- 
dictions can be relied on in diagnosis. Cheng 
et al. attempt a similar approach, classi- 
fying human missense variants as “likely . 
pathogenic” or “likely benign” on the basis 
of thresholds required to achieve 90% pre- 
cision. Although this will undoubtedly be 
helpful for variant interpretation and prior- 
itization, it is important not to confuse these 
labels with the very specific clinical defini- : 
tions of these terms, which rely on multiple 
lines of evidence. Furthermore, the concept 
of mutation pathogenicity is tremendously 
complex—for example, inheritance and al- 
lelic state are usually ignored by computa- * 
tional predictors and incomplete penetrance 
may be present, which means that not all 
people who carry a “pathogenic” allele will 
manifest clinical disease, owing to, for ex- 
ample, genetic background or environment. 
Although no VEPs can, as of now, be relied on 
alone for genetic diagnosis, their utility in the 
diagnostic odyssey will continue to improve 
as both computational approaches and strat- 
egies for their interpretation advance. & 
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How mothers tolerate their children 


Shifting pools of antigen can influence pregnancy-induced immune tolerance 


By Paige M. Porrett 


hen pregnant, maternal immu- 

nologic tolerance is required 

to ensure that the fetus is not 

“rejected” by the mother’s im- 

mune system. This tolerance is 

achieved by expansion of a subset 
of CD4* T cells with regulatory properties 
(Tyee cells) that protect against fetal loss 
(1-3). Maternal sian cells develop immuno- 
logic memory, which confers fetal-specific 
protection against complications in sub- 
sequent pregnancy (4). Prior studies have 
suggested that memory Theg cells are sus- 
tained by fetal microchimerism, the pres- 
ence of fetal antigens that persist in the 
mother after birth. But the details of how 
Ts cells interact with fetal antigens after 
pregnancy have remained a mystery (5). 
On page 1324 of this issue, Shao et al. (6) 
used a mouse model to demonstrate that 
these small pools of fetal antigen change 
between pregnancies and affect the fate 
of fetal-specific Ties cells in the maternal 
repertoire. Therefore, the study reveals 
how mothers “remember” their genetically 
distinct children through changes in both 
the pool of antigen and the pool of memory 
Nee cells. 

Although it is well established that T cell 
populations can be long-lived, versatile, 
and dynamic over time, the concept that an 
antigen pool may be equally long-lived, ver- 
satile, and dynamic is less well developed. 
It is somewhat intuitive that latent viruses 
can be a source of persistent antigen that 
alters T cell responses, but it is much less 
clear how antigens that derive from hu- 
man tissues (alloantigens) are maintained 
inside a genetically distinct individual 
over time. Nevertheless, prior studies have 
demonstrated that fetal antigens not only 
disseminate widely throughout the mother 
during pregnancy (7) but can persist for 
many years (8). Interestingly, exchange of 
alloantigens during pregnancy is bidirec- 
tional; maternal antigens can also deposit 
into offspring, a phenomenon called ma- 
ternal microchimerism (5). 

The study by Shao e¢ al. builds on these 
prior studies to illustrate the dynamic na- 
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ture of microchimerism. Using sensitive 
polymerase chain reaction (PCR) assays, 
Shao et al. detected antigens from the 
paternal mouse (sire) in multiple tissues 
of the mother mouse (dam) after comple- 
tion of pregnancy. When these dams were 
later mated with a different sire, the au- 
thors could no longer detect the antigens 
derived from the first sire, suggesting that 
the antigens from the first pregnancy had 
been replaced by antigens from the new 
sire. These observations were not limited 
to just fetal microchimerism but also ex- 
tended to maternal microchimerism in 
daughter mice. Taken together, these data 
demonstrate that tissue antigens can en- 
dure for a long period of time but can also 
be replaced when new antigens arrive. 
The observations reported by Shao et 


“ail antigen pool 
may be...long-lived, versatile, 
and dynamic...” 


al. provoke many questions about how ex- 
actly this shift in the antigen “niche” oc- 
curs: What is the cellular source of this 
microchimerism? If tissue antigens from 
another genetically distinct mouse can 
persist for days or weeks, it is unclear how 
and why they disappear over the course 
of a new pregnancy. Moreover, it is un- 
known whether shifts in microchimerism 
occur in other contexts, such as organ 
transplantation or cancer. Additional ex- 
perimental work will be needed to answer 
these questions. 

A second important contribution of the 
study of Shao et al. is the demonstration 
that tissue microchimerism has immuno- 
logic impact. They correlated maternal T 
cell responses and pregnancy outcomes 
with the presence of microchimerism. 
Notably, fetal-specific Teg cell identity was 
linked to the presence of microchimerism 
derived from the same pregnancy. As the 
microchimeric antigen pool shifted be- 
tween pregnancies, es cells specific for 
antigens of a prior pregnancy lost expres- 
sion of the master lineage transcription 
factor forkhead box P3 (FOXP3) and dif- 
ferentiated into “ex-T,.. cells.” ExT... cells 
in postpartum mice retained potential for 


FOXP3* expression, suggesting that they 
represent committed suppressor cells. 
However, populations of ex-T,,. cells that 
arise in proinflammatory conditions and 
acquire cytotoxicity have also been de- 
scribed (9). Although additional molecular 
studies are needed to better understand 
the epigenetic programming of ex-T,,, 
cells, these findings add to a growing body 
of evidence that fetal microchimerism con- 
tributes to T cell immunomodulation in 
pregnancy and potentially in disease. 

In addition to its effect on CD4* Ae cells, 
recent evidence suggests that fetal anti- 
gens promote the differentiation of pro- 
grammed cell death protein 1-expressing 
(PD-1*) CD8* T cells with phenotypic and 
molecular features of T cell exhaustion—a 


fate that is adopted by CD8* T cells in the ‘ 


presence of chronic signaling through the 
T cell receptor (J0-12). However, whether 
fetal microchimerism contributes directly 


to CD8* T cell fate has yet to be determined, ~ 


and future studies that investigate the in- 
fluence of microchimerism on other ma- 
ternal T cell populations will be needed to 
understand the full array of consequences 
of pregnancy on maternal immunity. 
Beyond its contributions to an improved 
understanding of immune tolerance in preg- 
nancy, the study of Shao et al. highlights the 
importance of understanding antigen per- 


sistence in many contexts. Such knowledge ‘ 
. e C 
could be used to design therapeutics that 


manipulate antigen depots to coerce toler- 
ance instead of immunity, which has impli- 
cations for cancer immunology, autoimmu- 
nity, and transplantation. 
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Metabolic control of antitumor immunity 


Mitochondrial metabolite reduces melanoma growth by boosting antigen presentation 


By Andromachi Pouikli! and 
Christian Frezza'? 


he metabolism of cancer cells adapts 

to meet an increased need for the en- 

ergy, biosynthesis, and antioxidants re- 

quired for proliferation, tumor growth, 

and metastasis (7). Yet, whether this 

metabolic reprogramming affects the 
recognition and elimination of cancer cells 
by immune cells has not been well in- 
vestigated. On page 1316 of this issue, 
Mangalhara et al. (2) report a connec- 
tion between a cancer cell’s mitochon- 
drial metabolism and its ability to evoke 
an immune response in a mouse model 
of melanoma. Specific perturbations of 
the mitochondrial electron transport 
chain increased succinate production 
in cancer cells. Succinate accumulation 
caused epigenetic rearrangements, 
which induced the expression of genes 
involved in antigen presentation. 
This promoted the detection of tumor 
cells by surveilling T cells—tumor im- 
munogenicity. The identification of a 
mechanism by which mitochondrial 
metabolites shape tumor immuno- 
genicity has potential for developing 
anticancer therapies. 

The adaptive immune system com- 
prises cells that recognize and respond 
to various external stimuli through a 
process called antigen presentation. 
During the early stages of tumor de- 
velopment, cytotoxic immune cells, 
such as CD8* T cells, recognize and 
eliminate immunogenic cancer cells 
(3), preventing tumor growth and me- 
tastasis. However, as they progress, 
tumors often acquire properties that 
allow them to escape immune detec- 
tion. Metabolism of immune cells is an 
important determinant of their func- 
tion, including anticancer immunity 
(4, 5). But whether metabolism in cancer 
cells affects their immunogenicity has been 
an open question. Mangalhara et al. assessed 
the role of complex I (CI) and complex II 
(CID of the mitochondrial electron transport 
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chain on melanoma growth. Pharmacological 
inhibition or genetic ablation of CII in mice 
enhanced the antitumor immune response 
by increasing antigen presentation by mela- 
noma cells. This blocked tumor growth. 
How is CII inhibition in cancer cells con- 
nected to the antitumor immune response? 
Metabolism is tightly linked to epigenetics; 
various metabolites act as substrates or co- 
factors for DNA- and histone-modifying en- 


Metabolic change stops tumor growth 
Loss of complex II activity in the mitochondria and concomitant 
accumulation of succinate inhibit lysine demethylases in the nucleus. 
This leads to increased histone methylation, especially on MHC-1-APP 
genes, which induces their transcription. The resulting enhanced 
antigen presentation and T cell activity block melanoma growth. 
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zymes that regulate the chromatin landscape 
and, thus, gene expression (6). For example, 
a-ketoglutarate, which is produced in mi- 
tochondria as part of the tricarboxylic acid 
cycle, is required for the activity of histone 
demethylases (6). The function of these en- 
zymes is also regulated by competitive in- 
hibitors, most of which are tricarboxylic acid 
cycle intermediates, such as succinate (6). 
Mangalhara et al. demonstrate that in tu- 
mor cells succinate accumulation resulting 


Cancer cell 


from CII inhibition decreased the a-ketoglu- 
tarate/succinate ratio, which subsequently in- 
hibited histone demethylases. The inhibition 
of these epigenetic enzymes increased the 
trimethylation of histone 3 lysine 4 (H3K4) 
and H3K36 on genes involved in antigen 
processing and presentation, which induced 
the expression of these genes and activated T 
cell-mediated killing of tumor cells (see the 
figure). These changes were reversed by the 
addition of a-ketoglutarate, which re- 
activates histone demethylases. 

The systemic inhibition of CII ac- 
tivity is not a viable therapeutic ap- 
proach because of possible adverse 
side effects, including neurotoxicity 
(7). Instead, Mangalhara et al. pro- 
pose modification of the electron 
transport chain in cancer cell mito- 
chondria. They found that the genetic 
ablation of methylation-controlled 
J protein (MCJ)—a Cl-interacting 
protein—enhanced_ electron flow 
through CI. This reduced CII activ- 
ity and led to succinate accumulation 
and higher antitumor immunity. This 
approach could improve immuno- 
therapy success, especially in tumors 
with low expression of antigen pro- 
cessing and presentation genes. 

The results of Mangalhara et al. 
inspire several follow-up questions. 
For instance, the observation that 
the inhibition of CII activity arrests 
tumor growth seemingly contradicts 
the established role of succinate as 
a tumor-promoting metabolite (8). 
This apparent discrepancy, also dis- 
cussed by the authors, is likely ex- 
plained by the context of where tu- 
mor-promoting CII mutations occur. 
Most loss-of-function CII mutations 
are inherited and thus are present 
from the early stages of tumorigen- 
esis, promoting tumor initiation and 
progression (9). Instead, Mangalhara et al. 
show that succinate has a proinflamma- 
tory effect on already-established tumors, 
affecting tumor growth and development. 
Furthermore, CII germline mutations 
might coexist with other oncogenic genetic 
alterations and stimulate tumor initiation 
and progression synergistically, masking 
the possible antitumor effect of succinate. 
Future research should elucidate the mo- 
lecular mechanisms by which succinate 
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exerts an oncogenic or antitumor effect. 

It is not clear how succinate is exported 
from mitochondria to act in the nucleus 
and alter gene expression. Mitochondrial 
dicarboxylate carrier (SLC25A10) is the mito- 
chondrial succinate carrier (J0), but whether 
it is involved and whether alterations of its 
activity are implicated in the proposed succi- 
nate-histone demethylase-immunogenicity 
axis remain unexplored. Furthermore, succi- 
nate might diffuse into the nucleus through 
pores in the nuclear membrane. However, 
the mechanisms that maintain the two dis- 
tinct pools of succinate—mitochondrial and 
nuclear—and prevent unnecessary interor- 
ganelle translocation remain elusive. As a 
possible explanation, recent studies have 
identified a noncanonical tricarboxylic acid 
cycle in the nucleus, which produces succi- 
nate locally (17, 12). Nevertheless, it remains 
unclear how timely and precise transport 
of succinate from the mitochondria to the 
nucleus is achieved. Indeed, mitochondria, 
cytosol, and the nucleus differ in their 
biophysical properties, such as viscosity 
(13, 14), which might be a barrier to inter- 
organelle trafficking. 

The findings of Mangalhara et al. are 
exciting on multiple levels. The identifica- 
tion of a connection between mitochondrial 
metabolites and the epigenetic regulation 
of immunogenicity could have broad impli- 
cations for immunology and for the study 
of conditions characterized by succinate ac- 
cumulation, such as ischemia-reperfusion 
injury (5). These findings also suggest that 
targeting mitochondrial metabolism could 
be an effective approach for cancer immu- 
notherapy, to boost the effects of immune 
checkpoint inhibitors. 
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Ultrathin membranes 


to sieve gases 


Mixed-matrix membranes could enable gas 


separation for carbon capture 


By Ziqi Yang and Dan Zhao 


embranes for hydrogen-carbon 

dioxide (H,-CO,) gas separation 

have potential applications as en- 

ergy-efficient components in carbon 

capture technology. Mixed-matrix 

membranes (MMMs), which are 
composed of polymers and filler materials 
such as metal-organic frameworks (MOFs), 
are used to achieve a good separation perfor- 
mance by leveraging the intrinsic benefits of 
each component. However, developing robust 
interfacial compatibility between polymers 
and filler materials is a critical challenge. On 
page 1350 of this issue, Chen et al. (1) report 
a solid-solvent processing (SSP) strategy to 
fabricate defect-free MOF-based MMMs with 
ultrathin selective layers and high filler load- 
ings that are capable of separating gas pairs 
with subangstrom precision. 

Permeance and selectivity are the primary 
parameters for evaluating the performance 
of gas-separation membranes. Permeance 
measures how fast gas molecules penetrate 
through a membrane, which varies inversely 
with membrane thickness. Permeance is 
often measured in gas permeation units 
(GPU; 1 GPU = 3.35 x 10°° mol m” s? Pa). 
Selectivity measures the extent to which the 
desired gas is separated from other gases. 
For H,-CO, separation, membranes with 
an H, permeance larger than 300 GPU and 
H,-CO, selectivity larger than 30 make mem- 
brane systems attractive compared with the 
conventional Selexol process, which uses a 
liquid solvent to remove CO, from a gas mix- 
ture (2). Membrane materials with strong 
size-sieving characteristics are beneficial for 
improving H,-CO, selectivity, whereas reduc- 
ing membrane thickness is effective for en- 
hancing permeance (3, 4). 

Most polymeric membranes are unable 
to maintain an acceptable balance between 
both high permeance and selectivity (5). 
Although inorganic membrane materi- 
als, such as MOFs, exhibit high permeance 
and selectivity, they are difficult to use in 
large-scale fabrication processes because of 
their mechanical fragility and high cost (6). 
Integrating inorganic filler materials into a 
polymeric matrix provides an alternative to 


engineering MMMs that combines the pro- 
cessability of polymers with the high separa- 
tion properties of filler materials. Therefore, 
it is necessary to fabricate MMMs with thin 
selective layers and high filler loadings in 
an easy and scalable manner. This is espe- 
cially important for fabricating membranes 
with sub-100-nm thickness and filler load- 
ings exceeding 50 vol %. However, interfacial 
defects stemming from the incompatibility 
between polymers and filler materials tend 
to be exacerbated under such membrane 
configurations (6). 

To prepare MMMs with high permeance 
and selectivity, Chen et al. used the SSP strat- 
egy for the in situ conversion of copper metal 
salts (precursors to MOF crystals) within 
the polymeric matrix. A similar strategy was 
applied to fabricate pure MOF membranes 
(7). Specifically, the SSP strategy combines 
a coating process to form an ultrathin metal 
salt polymer layer on a porous substrate and 
a ligand vapor treatment to trigger the in 
situ synthesis of the MOFs (see the figure). 
The fabricated MMMs achieved high MOF 
loadings of up to 80 vol % with a membrane 
thickness of 50 to 100 nm. Although previous 
studies have managed to develop MMMs that 
feature either high filler loadings or ultrathin 
selective layers, the convergence of the two 
attributes has rarely been reported. 

The MMMs fabricated by Chen e¢ al. 
showed H, permeance of 1788 GPU and 
mixed-gas (50/50 vol %) H,-CO, selectivity 
of 44.3 at 120°C and 1.5 bar, which is supe- 
rior to most of the reported MMMs (8). This 
high-performance stems from the judicious 
selection of the MOF and the membrane fab- 
rication method. The selected MOF, based on 
copper metal ion, pyrazine and hexafluoro- 
silicate ion [Cu(SiF,)(pyz),], possesses an ef- 
fective aperture size of 2.5 by 2.2 A, which is 
large enough for H, but not CO, to percolate 
(9). Using the SSP approach, the molecular 
sieving MOF was successfully integrated into 
a polymeric matrix with filler loadings above 
50 vol %. As a result of this high filler load- 
ing, gas transport through the MMM was 
predominantly governed by the MOF phase, 
leading to H,-CO, selectivity above 40. 

Ideally, the selected polymer should ex- 
hibit preferential selectivity for the same 
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gas component as the filler material. For 
example, polymers with rigid structures are 
beneficial for H,-CO, separation. However, 
Chen e¢ al. considered other factors that af- 
fect the overall membrane fabrication and 
quality. Hydrophilic polymers with flexible 
chains, such as the selected polyethylene 
glycol (PEG) and polyvinyl alcohol (PVA), 
may exhibit a better ability to dissolve metal 
salts during the coating process, seal grain 
defects, and regulate MOF sizes during the 
in situ synthesis. 

An advantage of the SSP strategy is the 
feasibility of using three-dimensional (3D) 
framework fillers to form ultrathin mem- 
branes. By contrast, current strategies for 
hybrid membrane fabrication rely pre- 


cessability and scalability of this approach. 

Further improvements involve expanding 
the selection of filler materials to a broader 
range of MOFs or other porous materials, 
especially those with rigid pore structures. 
In addition, it is necessary to ensure good 
matching among MOFs, polymers, and sub- 
strates for good processability. Strong adhe- 
sion with the porous substrate is helpful in 
preventing membrane delamination and 
stress at the membrane-support interface, 
especially for large-scale fabrication. The in 
situ conversion temperature for MOF growth 
should align with the polymer glass tran- 
sition temperature. High conversion tem- 
peratures would increase polymer fluidity, 
affecting the MOF nucleation process and 


Mixed-matrix membrane fabrication 

Conventional mixed-matrix membrane (MMM) fabrication fails to achieve simultaneous ultrathin layers and 
high loading of fillers, such as metal organic frameworks (MOFs), because of poor interfacial compatibility. This 
results in low gas selectivity. Solid-solvent processing enables the fabrication of defect-free ultrathin MMMs 


containing high MOF loadings that allow effective gas separation. 
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dominantly on 2D filler materials (10, 11). 
Achieving such morphology presents ad- 
ditional constraints, including complicated 
fabrication methods. Using 3D instead of 
2D materials can potentially yield a greater 
diversity of channel shapes and pore geom- 
etries, which can be exploited in challenging 
gas separations such as that of propylene- 
propane. SSP could also avoid some per- 
sisting processability issues and permit the 
exploration of other industrially applicable 
membrane configurations. For example, 
hollow fibers have large effective surface ar- 
eas but are difficult to produce by using 2D 
materials. To demonstrate this point, Chen 
et al. fabricated an ultrathin hollow-fiber 
MMM that still exhibited high H, perme- 
ance of about 1000 GPU and high H,-CO, 
selectivity of about 30, underlining the pro- 
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High permeance and @ 
high selectivity 


potentially resulting in unfavorable MOF 
particle sizes. Future studies should aim to 
evaluate membrane performance under in- 
dustrially relevant conditions to facilitate 
large-scale membrane fabrication in a cost- 
effective manner. 
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MOLECULAR BIOLOGY 


Repetitive DNA 
regulates gene 
expression 


Short tandem repeats affect 
gene expression by binding 
regulatory proteins 


By Thomas E. Kuhlman 


pproximately 5% of the human ge- 

nome consists of short tandem repeats 

(STRs), sequences in which repeating 

units of 1 to 6 base pairs form an ar- 

ray of up to 100 base pairs long (J, 2). 

Variations in STR length are associated 
with, and often the causative agents of, gene 
expression changes present in some heredi- 
tary conditions, for example, Huntington’s 
disease, autism, and schizophrenia (3-5). 
However, the mechanisms through which 
STRs exert their effects on gene expression 
remain poorly understood. On page 1304 of 
this issue, Horton et al. (6) demonstrate that 
STRs exert their effects by directly binding 
transcription factor proteins, thus explaining 
how STRs might influence gene expression in 
both normal and diseased states. 

STRs are enriched within regions of the 
genome that are associated with controlling 
the expression of genes (7). In these regions, 
transcription factors bind to specific recog- 
nition sequences in the DNA and to each 
other in complex arrangements to regulate 
gene expression. Horton e¢ al. used high- 
throughput in vitro microfluidic assays 
combined with more traditional molecular 
biology techniques and theoretical and com- 
putational modeling to demonstrate that 
two example transcription factor proteins 
that have similar recognition sequences— 
phosphate system positive regulatory pro- 
tein Pho4 from the yeast Saccharomyces 
cerevisiae and MYC-associated factor 
X (MAX) from humans—bind directly, 
though weakly, to STRs. The authors dem- 
onstrate that this direct binding to STR 
sequences that surround specific recog- 
nition sites results in a higher affinity of 
transcription factor binding. Furthermore, 
they showed that binding of transcription 
factors to STRs occurs through the same 
mechanisms that are involved in their 
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binding to specific recognition sites. 
Previous work used models that describe 
how specific parts of transcription factor pro- 
teins directly interact with DNA base pairs 
to predict the strength of the interaction 
but failed to predict the observed affinity of 
transcription factors for STRs (8). However, 
Horton e¢ al. incorporated data on the affinity 
of Pho4 and MAX to STR-like DNA oligonu- 
cleotides (9-11) into statistical models of pro- 
tein-DNA binding to quantify the observed 
transcription factor-STR interactions. Once 


more energetically favorable bound state. 
Horton et al. found that STRs also increase 
the affinity of Pho4 and MAX for their 
recognition sites by reducing the average 
time it takes for the transcription factors 
to locate and bind to them. The probabili- 
ties of transcription factors being bound at 
specific recognition sequences ultimately 
determine the expression levels of the regu- 
lated genes (12). Therefore, how transcrip- 
tion factors find and bind to their specific 
target sequences is one of the fundamental 


Short tandem repeats alter transcription factor binding 


Short tandem repeats (STRs) influence transcription factor (TF) protein binding at core gene regulatory sequences through 
combination of factors. The binding of TFs to STRs is weaker than it is to core regulatory sequences but stronger than it is to 
random genomic sequences, thus increasing the size of the energetically favorable binding area, as indicated by a reduction 
in the change in Gibbs free energy (AAG) associated with binding (1). This increase enables the simultaneous binding of 
multiple TFs close to the core binding site, which promotes the recruitment of TFs to the core sequence (2). STRs have low 

sequence complexity, so TFs with different target sequences could be recruited to the same STR (3). The repeats in the STRs 
result in multiple equivalent TF binding states that reduce the entropic cost of localizing TFs to specific binding sites (4). STRs 
increase the rate at which TFs can find and bind their target sequences (k,,.), but the dissociation rate (k 


Energetically favorable binding area 


AAG 


ACAGTC 


TF2 


extended to describe other transcription fac- 
tors, these models could potentially be used 
to understand how STRs tune transcription 
factor binding to precisely regulate gene ex- 
pression. These models could also provide 
information on the effects of STR arrays on 
transcription factor binding at other loci 
and on how this function changes in disease 
states or how STRs could be used to modify 
the behavior of synthetically designed gene 
expression systems. 

Horton et al. demonstrate that STRs 
exert their effects through a variety of 
physical mechanisms (see the figure). 
Because they are larger than the recogni- 
tion sequence, STR arrays can bind mul- 
tiple transcription factor proteins, which 
increases their protein concentration near 
the recognition site. Furthermore, the re- 
peated nature of the STRs exerts a purely 
statistical effect: By allowing for multiple 
identical binding states, the STRs reduce 
the entropy cost of localizing a protein at 
a single recognition site, which results in a 
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Core regulatory 
sequence 


off 


Core binding area 


questions of biological physics (13) and re- 
mains an area of intense research. 

The traditional view of how a transcrip- 
tion factor protein finds its recognition sites 
is that during the search process, it can ex- 
ist in only two states: The protein is either 
specifically bound to a recognition site or 
engaged in a complex search process that 
involves nonspecific binding and probing of 
the DNA as it looks for those sites. Hence, 
the rate at which a transcription factor binds 
(the “on rate”) should not depend on the 
DNA sequence. Consequently, how “tightly” 
a transcription factor binds to its recogni- 
tion site should mostly be a function of the 
rate at which it falls off once bound (the “off 
rate”), which is determined by how closely 
the sequence that it is bound to resembles 
the sequence of a recognition site. Conversely, 
and along with other recent work in bacte- 
rial systems (74), Horton et al. found that STR 
arrays mostly affect the on rate by introduc- 
ing additional possible “quasi-specific” states 
to the search process, whereas the off rate is 


) is largely unaffected (5). 


CATGACGTACGGACAAT 


Random genomic 
sequence 


largely unaffected. This surprising result sug- 
gests that STRs also affect binding kinetics, 
thereby playing a role in enhancing the speed 
at which organisms can respond to chang- 
ing environments. However, the microscopic 
details of exactly how STRs affect transitions 
between specific, quasi-specific, and nonspe- 
cific states of the search process of transcrip- 
tion factors remain to be explored. 

STR arrays have low sequence complex- 
ity, which means that disparate transcrip- 
tion factors that normally bind substantially 
different recognition sequences 
may be capable of binding the 
same STR with similar affinities, 
or, alternatively, similar tran- 
scription factors could have dif- 
ferent affinities for STRs. Hence, 
STRs could bind a host of differ- 
ent transcription factors, thereby 
accumulating proteins that are 
not directly involved in regulat- 
ing expression of the associated 
gene. STRs can also have higher 
mutation rates than the overall 
genome (15), which could enable 
them to evolve into specific bind- 
ing sites for transcription factors. 

Based on their models and 
the known binding character- 


fev} 


Horton et al. predict that ~1300 
transcription factors from 114 
diverse species will interact 
with STRs in a similar manner 
to that observed for Pho4 and 
MAX. This work should serve as 
a motivating entry point for fu- 
ture experiments and modeling 
to explore transcription factor 


F 


istics of transcription factors, . 


and STR interactions and their effects on : 
search-and-response kinetics. Other open 


questions include how these regulatory STR 
sequences evolve and how STRs and their 
variations result in the development of hu- 
man disease. The latter point, in particular, 
could provide information on how to design 
or alter the expression of genes to behave 
in more desirable ways or to correct STRs 
associated with disease. 
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DRUG POLICY 


Modeling cartel size to inform 
violence reduction in Mexico 


Estimating stocks and flows is an innovative first step 


By Jonathan P. Caulkins’, Beau Kilmer’, 
Peter Reuter? 


mmense violence and corruption in 

Mexico, and their connections to illegal 

drugs in the United States, are a great 

problem of our time. Mexico’s homicide 

rate in 2022 was 25 per 100,000, similar 

to Colombia’s and more than triple the 
US rate. Measuring corruption is notori- 
ously difficult, but some Mexican criminal 
organizations have a history of intimidating 
and bribing government officials (7). On 
page 1312 of this issue, Prieto-Curiel et al. 
(2) take on two important tasks: estimating 
how many people are employed by, and flow 
into and out of, Mexican criminal organiza- 
tions responsible for much of the violence 
and corruption, and creating a model that 
permits “what-if” analysis of policy inter- 
ventions. Concluding that increasing incar- 
ceration will lead to higher criminal em- 
ployment and violence, the authors argue 
that restricting organizations’ ability to re- 
cruit, such as by offering better alternative 
employment, is “the only way to lower vio- 
lence in Mexico.” 

These organizations (usually called “car- 
tels” although they do not meet the eco- 
nomic definition of the term) participate 
in multiple illegal activities, but traffick- 
ing drugs such as cocaine, fentanyl, heroin, 
and methamphetamine is thought to ac- 
count for a large share of their revenues (3). 
Prieto-Curiel et al. provide the first system- 
atic estimates of individual cartel sizes and 
total cartel employment. Prior to their pa- 
per, there were only expert guesses at the 
size of a few of the more prominent orga- 
nizations. The article accomplishes this in 
part by assembling a variety of data that 
had been accessible but scattered and also 
by integrating those data through a stocks 
and flow model. The data included official 
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government statistics on the number of 
homicides, missing persons, and incarcera- 
tions, as well as data from open sources on 
the number of cartels and their distribution 
across states gathered by the social science 
research organization Programa de Politica 
de Drogas. The model is an important con- 
tribution, as there had previously been few 
serious attempts to write down equations 
that capture the “physics” of what drives car- 
tel size or violence. 


ESTIMATING STOCKS AND FLOWS 

Stocks and flows models are common across 
many scientific disciplines but remain rela- 
tively rare in the study of crime and drug 
policy. Even the pivotal recognition that the 
population of people with opioid use disor- 
der (OUD) is a dynamic system is relatively 
recent (4). The associated idea that long- 
term reductions in opioid overdose deaths 
requires reducing the flow into the stock of 
people with OUD (5) is parallel to Prieto- 
Curiel et al’s focus on reducing the flow of 
employees into the cartels. 

There are nuances with counting the stock 
of cartel employees. For example, does the 
stock represent the number of people who 
were involved within the last year, even if 
only briefly, or just the regular workers? [It 
has long been recognized that, at least in US 
drug markets, many more individuals work 
in drug distribution than there are “full-time 
equivalent employees” because there is so 
much part-time or “gig” work (6)]. And does 
the stock count only traffickers or also those 
involved in drug production (from farmers 
growing poppies to chemists synthesizing 
fentanyl)? Nonetheless, the concept of study- 
ing these organizations’ labor force as a stock 
with associated inflows and outflows is an 
innovative and interesting perspective. 

In round terms, the analysis by Prieto- 
Curiel et al. implies that 


175,000 staff in 2022 = 115,000 staff in 2012 
- 110,000 lost to outflows + 170,000 recruits 


where the outflows are from violence, in- 
carceration, and all other exits, which they 


aggregate under the term “saturation.” ree 

The outflows from death and incarcera-— 
tion are not measured perfectly (e.g., it is 
not always easy to know who was or was 
not a cartel member), but they are known 
at least roughly. Being able to “scale” them 
relative to the previously unknown stock 
size gives a useful sense of perspective. For 
example, if as the paper claims, there are 
now about 175,000 cartel employees, and 
their annual risk of death or disappearance 
due to conflict is 6500/175,000 = 3.7% per 
year, that is about 1.5 times greater than 
the death risk for US men and women who 
served in World War II (overall, not just per 
year). It is also substantially higher than 
one estimate of the annual risk of being 
killed in drug markets in Washington, DC, 
when the crack cocaine market was near its 
height in the mid-1980s: 1.4% (6). - 

Those figures would imply that, from the 
perspective of a typical cartel employee or 
prospective employee, incarceration risk is 
not the primary cost to be traded off against 
the benefits of criminal income, camarade- 
rie, and whatever other benefits they may 
perceive. Even if the probabilities of death 
and incarceration are about the same, the 
consequences of being a homicide victim 
presumably weigh more heavily. Doubling 
or tripling the incarceration rate might then ~ 
not decisively alter the balance of perceived 
pros and cons from joining a cartel—a per- 
spective entirely consistent with the authors’ 
pessimism about the limitations of “reactive” 
policies that rely on incarceration. 

That said, there are some questions about 
whether the model can really show that 
there were about 175,000 employees in 2022. 
To a first-order approximation, the model’s 
estimate of the cartels’ collective size should *‘ 
scale to the numbers of cartel member deaths 
and incarcerations, which in turn are equal 
to (reasonably) well-measured national to- 
tals multiplied by weakly justified presump- 
tions by the authors that f = 10% of deaths 
and g = 5% of incarcerations are suffered by 
cartel members. If those two proportions f . 
and g turned out to be twice as large, then 
the cartels could all be twice as large while 
keeping the model basically the same. 

This is easiest to see from equation 1 of the 
supplementary material’s section on cartel 
size and parameter estimation. In that equa- 
tion, if one doubles the incarceration flows, 
violence flows, and cartel sizes (Cs) while 
halving the internally estimated parameters 
governing exits from conflict and saturation 
(© and w) and keeping the recruitment rate 
parameter (p) the same, one gets exactly the 
same differential equation. 

Indeed, it is not clear why the total size 
of the cartels (C) combined with the f and 
g parameters is not essentially indetermi- 
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nate. The authors’ figure 4 implies a highly 
nonlinear relationship between the best- 
fitting C and the values of f and g, which 
is puzzling. Perhaps that emerges from the 
particular form of the matrix of Si inter- 
cartel violence intensity parameters and/ 
or the assumption that initial cartel sizes 
C, follow a power-law distribution, either of 
which would seem to be a highly indirect 
basis for determining the total scale of car- 
tel employment. 


“WHAT-IF” POLICY MODELING 

Even if it is granted that total cartel em- 
ployment is now about 175,000, the other 
question the article tackles is even more am- 
bitious. It is one thing to build a model that 
reproduces historical stocks and flows, and 


7 


City Hall in Villa Union, Mexico, is riddled with bullet holes from a gunbattle that began on 30 November 2019 


There are historical instances of drug traf- 
ficking’s growth being limited by scarcity 
of certain skilled roles, such as a shortage 
of chemists capable of synthesizing large 
amounts of LSD circa 2000 after some ma- 
jor LSD producers were shut down (7). But 
we are not aware of any historical instances 
in which a highly profitable drug-marketing 
opportunity lay unexploited because criminal 
enterprises were unable to recruit enough 
workers more generally. 

Consider this from the perspective of the 
entire supply chain. Suppose the Mexican 
government intervened to reduce the appeal 
of working for cartels and the cartels needed 
to counteract that by raising wages. If final 
consumers in US markets could accept a 1% 
increase in retail prices without major reduc- 


between cartel members and security forces that left 22 people dead. 


quite another to have a model that contin- 
ues to capture well the stocks and flows in 
a future world characterized by substantially 
different policy conditions. Cartel members 
are not billiard balls or atoms locked into 
mechanistic reactions to external shocks. 
Cartels are adaptive organizations often run 
by intelligent people who can alter behavior 
in response to changing conditions. 

For example, even if over the period of 
historical data (2012-2020) the cartels re- 
cruited in a manner that is well captured 
by the model’s constant per-capita recruit- 
ing rate, p, cartel leaders could alter their 
recruiting practices (e.g., offer to pay higher 
wages) if that became necessary to counter- 
act some government initiative designed to 
slow their recruiting. 
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tions in purchases, how much would that free 
up for incentive payments to Mexican cartel 
workers? Retail sales in US markets were 
about $150 billion per year in 2016 (8), so that 
1% price increase could generate about $1.5 
billion in additional revenue. That is more 
than $8500 per cartel member (at 175,000 
members), or almost the average gross do- 
mestic product per capita in Mexico. Surely, 
that would help maintain the workforce. 

If cartels were for some reason unwill- 
ing or unable to raise wages, they might 
instead innovate to develop less labor-in- 
tensive ways of distributing drugs, perhaps 
sending fewer small shipments across the 
border by human couriers and sending 
more large shipments hidden in shipping 
containers. From 2013 to 2018, Chinese 


producers shipped illegally manufactured 
fentanyl (IMF) powder to the United States 
largely through postal and parcel services, 
which presumably required very few work- 
ers (9). More generally, the model imagines 
that recruitment is governed by a fixed 
parameter, p, whereas one could argue 
that recruitment is essentially a business 
decision taken by resourceful “managers” 
who will do whatever it takes to find the 
necessary staff. 


IDEAS FOR ADVANCING THE RESEARCH 
Although we raise these questions about the 
paper’s specific conclusions, we close by of- 
fering three ideas for advancing this general 
line of research, about which we are very 
supportive. 


Articulate which “first principles” drive 
cartel staffing and behavior 

The current model is something of a me- 
chanical black box and does not account 
for markets and incentives. That may not 
prevent it from generating credible popula- 
tion estimates, but an understanding of the 
underlying dynamics is essential for projec- 
tions and policy analysis. 

For example, the paper does not address 
why cartels’ staffing levels have been grow- 
ing—by 50% between 2012 and 2020 ac- 
cording to the model—when drug market 
trends might suggest the need for smaller, 
not larger staffs. The post-2012 develop- 
ment of legal cannabis supplies in many 
US states has reduced demand for Mexican- 
produced cannabis. Furthermore, cannabis 
prices have been falling (J0), so revenues 
from selling have fallen even more than 
the volume sold. Also, IMF became domi- 
nant in many US opioid markets over this 
period (9, 17). Although the vast majority of 
IMF and heroin used in the US is produced 
in Mexico, and IMF may generate as much 
revenue for cartels as did heroin, its pro- 
duction and distribution probably require 
fewer Mexican workers. 

It is not known what happened to to- 
tal cartel drug revenues over this period 
(discussed below), but if they did decline, 
perhaps this prompted cartels to diversify 
into other lines of business that are more 
labor intensive such as illegal mining, kid- 
napping, human trafficking, and extortion. 
Or perhaps cartel labor is increasingly em- 
ployed to defend drugs and workers from 
attacks by rivals, or to intimidate officials, 
not to produce or distribute drugs. After 
all, total annual US consumption of (pure) 
fentanyl circa 2021 was likely in the single 
digits of metric tons (17), which is tiny com- 
pared to the annual volume of other im- 
ported goods (e.g., 1,000,000 metric tons of 
Mexican avocados). The current model ap- 
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pears to be silent about such possibilities. 

Likewise, the model assumes that staff 
size drives violence: The more staff the 
cartels employ, the greater the number of 
killings of cartel members. But one could 
imagine a causal arrow running in the 
other direction. Perhaps cartels recruit 
more members in response to growing vio- 
lence, e.g., to deter attack or to retaliate ef- 
fectively. All this points to the field’s weak 
understanding of what fundamentally de- 
termines the levels of staffing and violence 
in drug markets. We hope that Prieto-Curiel 
et al’s bold action of writing down equa- 
tions forces the literature to engage in more 
explicit discussion about what fundamental 
principles drive cartel size and violence, 
even if that discussion ultimately leads the 
equations to be revised. 


Improve understanding of drug 

consumption and prices on both sides 

of the border 

Scientific investigation of fundamental 
principles must be grounded in data, but 
there are currently severe—albeit remedi- 
able—deficiencies in relevant data systems. 
Understanding cartel dynamics requires 
understanding their biggest market: Drug 
consumption in the US. Estimating the 
size of illegal markets is a complex exer- 
cise requiring information from multiple 
sources; one cannot simply rely on na- 
tional household surveys because they 
miss most of the consumption of cocaine, 
fentanyl, heroin, and methamphetamine 
(12). However, starting in the 1990s, the 
Office of National Drug Control Policy 
(ONDCP) regularly produced estimates of 
US drug consumption, expenditures, and 
numbers of consumers by synthesizing 
a wide variety of data indicators, includ- 
ing information from the (since defunded) 
Arrestee Drug Abuse Monitoring (ADAM) 
program. Unfortunately, the most recent 
estimates only extend through 2016 (8). 
Estimates of Mexico’s drug export revenues 
are even older (2012) (13). 

Another key piece of the puzzle is un- 
derstanding how prices escalate as drugs 
move down the multilayered domestic 
distribution network that stands between 
Mexican cartels and people who use 
drugs. Most of the money that US con- 
sumers spend purchasing illegal drugs 
remains with the domestic distribution 
network. Cartel revenues are basically re- 
tail sales revenue multiplied by the ratio 
of cartels’ high-level wholesale prices di- 
vided by the (much higher) retail price. 
Estimating prices by market level is not 
easy, especially given the need to adjust 
for drug-purity levels across the supply 
chain, but it is possible. Data about sei- 
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zures and undercover buys are recorded 
in databases such as the National Forensic 
Laboratory Information System (NFLIS) 
and the Drug Enforcement Administration’s 
System to Retrieve Information from Drug 
Evidence (STRIDE, now called STARLiMS). 
Additional insights lie in court documents 
from prosecuted cases. Methods exist for 
harnessing these data, and the US federal 
government used to regularly generate and 
report purity and price information, but 
the ONDCP price series stopped after 2012. 
The government could be doing much more 
to support analyses of illegal drug markets, 
but this is not a new recommendation (8, 9). 


Expand scope beyond murders 

of cartel members 

The authors’ focus on cartel size and vio- 
lence leads to consideration of a limited set 
of policy choices, primarily greater incarcer- 
ation or programs that reduce recruitment 
of cartel workers. Adopting a broader vi- 
sion of how cartels harm social welfare may 
expand the option set. Even granting that 
violence is the single greatest problem that 
cartels pose for Mexican society, the model 
only tracks lethal violence suffered by cartel 
members. It ignores violence perpetrated 
by cartels against others, including jour- 
nalists, politicians, and law enforcement. 
It also only tracks violent acts, whereas 
harm is also done by violence that is merely 
threatened, e.g., when business owners are 
extorted for “protection” money or political 
leaders cooperate and accept bribes instead 
of fighting cartels. 

There is no rigid rule connecting car- 
tels’ size—or at least the volume of their 
drug trafficking—with levels of violence. 
Consider that for many years prior to 2007, 
even though most of the illegal drugs ex- 
ported to the US passed through Mexico, 
the number of killings then was much 
lower, just one-third of the 2020 level. The 
standard explanation for the relatively mod- 
est level of violence associated with the car- 
tels then is that there had been a politically 
mediated settlement that corruptly linked 
the ruling Institutional Revolutionary Party 
(PRD) and the then small number of cartel 
leaders. Felipe Calder6n, whose 2006 elec- 
tion to president was widely contested, at- 
tacked the cartels to bolster his legitimacy. 
That destabilized the agreement, leading to 
more than 15 years of violence. 

We are not arguing nostalgically for res- 
toration of Pax PRI, but want to point out 
that there is nothing inherent in these mas- 
sacres, either to drug distribution or smug- 
gling, or to Mexico. As the costs of maintain- 
ing protective armies grows, and as older 
cartel leaders get killed off by their rivals 
or the military, perhaps a new set of leaders 


will prefer to come to some accommodation 
that reduces their costs. These markets are 
in fragile equilibria, if in equilibria at all. As 
suggested in the framework of Evolutionary 
Economics (14), the decisions by a few lead- 
ing figures may lead to quite different mar- 
ket outcomes. 

It is also worth noting that increasing the 
effectiveness of law enforcement is not syn- 
onymous with more incarceration. Kleiman 
(15) suggested that US agencies might be 
able to reduce violence in Mexico by focus- 
ing their enforcement efforts on US import- 
ers who buy from the most violent Mexican 
cartel. That would provide an incentive for 
Mexican cartels to avoid being known for 
violence, because US buyers would try to 
purchase from their less violent competitors. 
Kleiman acknowledged potential implemen- 
tation challenges, but the idea illustrates the . 
concept that intervening to reduce violence 
need not entail maximizing incarceration or 
minimizing cartel size. 

Finding ways to incorporate insights 
about markets and incentives into this 
type of model should be part of the next 
wave of research in this area. A richer un- ‘ 
derstanding of the underlying dynamics of 
markets may help inform more effective 
policy innovation. 
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New books, exhibitions, movies, and more 


Thanks to Science intern Jamie Dickman, for the first time our fall roundup features previews of upcoming 
museum exhibitions, films, television series, and games. Play a charming board game inspired by a real breeding 
experiment, then learn how life unfolded on Earth in a new film. Read about how fire shaped the world, then visit 
an exhibition that traces the origins of environmentalism. Embark on a video game voyage through the Galaxy, 
then dive into an irreverent volume that probes the logistics of space settlement. Get out or stay in, but don’t miss 
the media presented here, which examine topics ranging from computer vision to creativity. —Valerie Thompson 


A City on Mars 


Reviewed by Gifford J. Wong? 


Whether you are part of an enterprising crew 
going where no one has gone before, tasked 
with guarding the Galaxy, or being tested to 
see whether you have the right stuff, the op- 
portunity to explore space dazzles and seems 
full of promise. But if we are truly consider- 
ing establishing neighborhoods of humanity 
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beyond Earth, we need to seriously think 
about a large number of potential snags. 

A City on Mars, by Kelly and Zach 
Weinersmith, helpfully pulls back the curtain 
on some of the lesser-discussed challenges 
to humanity’s off-Earth pursuits. Their effort 
appears all the more insightful when you con- 
sider how little the public knows about space 
and space settlement science; the seeming 
lack of inquiry “into the more squishy details 
of human existence,” including whether to- 


day’s space law can effectively evolve to pro- 
tect our safety and ensure our rights; and the 
reality that “blasting humans in a rocket to 
the Moon is substantially more impressive 
than creating detailed reports about how to 
turn poop and food scraps into wheat.” 

The Weinersmiths—self-professed “space 
geeks’—poured four years’ worth of painstak- 
ing research, clear-eyed objectivity, and good- 
natured humor into this book, which “begins 
with a Uranus joke and contains an explainer 
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Friendlier foxes are a dice roll away 8y Jamie Dickman 


The Fox Experiment, a new board game inspired by a real-life scientific attempt to 
domesticate silver foxes, will be released this fall. We spoke with the game's co-creator 
Elizabeth Hargrave to learn more about how the game's designers incorporated science 
into gameplay. This interview has been edited for brevity and clarity. 


Q: The game is based on a domestication the experimenters were looking for the 
experiment in Russia that began in 1959 friendliest foxes in each round, and all of 
and is still ongoing. What inspired you to these physical traits started showing up in 
create a game about this project? the foxes as a by-product of the experi- 
A: | don't know where board game ideas ment. We had to abstract away from the 
come from, but | started thinking about genes and biochemistry that track along 
how cool it would be to create a game with those personality traits in the game 
where you are exploring genetics and and let people select based on physical 
creating new generations of animals that traits to keep things moving forward. We 
are passing traits down. It’s not something do nod to the original experiment and 
that | recalled seeing in a game before. say that the foxes with the most of these 


expressed physical traits are also the : 
Q: Can you provide a brief overview of the friendliest foxes. 
game’s rules and objectives? 
A: Inthe game, you breed foxes. You select Q: How are the genetic mechanisms of 


foxes at the beginning of the round and domestication represented in gameplay? 
then you roll dice to see what traits their A: | wanted the mechanic for how those 
offspring have. You record the traits on traits get expressed to be dice because 
offspring cards, and then those cards are genetics does feel like a roll of the dice. 

put back into the middle of the table and You don't know exactly what you are going 
they become parents for the next genera- to get each time. 

tion. Over the course of the game, you are 

making more foxes. You are rolling more Q: How did you ensure that you had the . 
dice. And by the end of the game, you are science of the game right? 

making foxes that have a bunch of these A: | read some of the original scientific 


traits that were expressed in the actual fox papers, and | also read the book How to 
experiment, like floppy ears and curly tails. Tame a Fox (and Build a Dog) co-written 
by Lee Dugatkin, who had actually gone 


Q: Domestication is a time-consuming and interviewed some of the folks in Rus- 

process based on selective breeding for sia who worked on the experiment. | had 
on space cannibalism.” They offer a realistic traits such as size and temperament. How him review the rules for the game and all 
and holistic vision of how humanity might does the game simulate and incorporate of the explanatory text to make sure that 
constructively pursue space settlement, be- these features? he thought | captured the essence of the 
ginning with a broad examination of how A: In the actual fox experiment in Russia, actual experiment. 


space affects human bodies and minds and 
ending with a discussion of some population- 
level questions about the nature of early so- 
cial organization and whether access to space 
changes society for the better. , i 

Any reader enthusiastic about space settle- ~ ; = pl x 
ment will find much to appreciate in this 
book. The Weinersmiths approach the topic 
with objectivity and humor (e.g., on the seri- 
ous topic of procreation in low to zero gravity, 
researched solutions include a special suit for 
two that uses Velcro straps to “keep a couple 
connected”), but most importantly, they write 
with a confident belief that humanity will one 
day travel off-planet. To do so, however, they 
make the case that “we have got to become 
wise if we want to go to the stars.” 


Experiment =—s=-— 28 


A City on Mars: Can We Settle Space, Should We 
Settle Space, and Have We Really Thought This 

Through?, Kelly Weinersmith and Zach Weinersmith, 
Penguin Press, 2023, 448 pp. The Fox Experiment, Elizabeth Hargrave and Jeff Fraser, Pandasaurus Games, 2023. 
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INSIGHTS | BOOKS 


IN PERSON 


Forces of Nature: Voices That 
Shaped Environmentalism 

Portraits of scientists, political figures, 
activists, writers, and artists populate 
this upcoming exhibition at the National 
Portrait Gallery, which features more than 
25 likenesses of leading US figures who 
contributed to modern environmental 
thought, including Rachel Carson, George 
Washington Carver, Maya Lin, and 
Edward O. Wilson. Curated by historian 
Lacey Baradel, the exhibition illustrates 
how the featured individuals shaped the 
country’s complicated and often para- 
doxical attitudes toward the environment 
from the late 19th century to today. 


National Portrait Gallery, Smithsonian Institu- 
tion, Washington, DC, USA, 20 October 2023 to 
2 September 2024 


Turn It Up: The Power of Music 

After a successful run at Manchester's 
Science and Industry Museum, this inter- 
active exhibition exploring the mysteries 
of music will open this fall at the Science 
Museum in London. Highlights include 
an in-depth exploration of how music af- 
fects our emotions, interviews with musi- 
cians, a virtual instrument controlled by 
breath, and an organ powered by flames. 
Visitors can even craft their own musical 
compositions. 


Science Museum, London, UK, 19 October 2023 
to 6 May 2024 


Grand Egyptian Museum 

After many delays, the highly anticipated 
Grand Egyptian Museum (GEM)—the 
world’s largest archaeological museum 
dedicated to a single civilization—is ex- 
pected to open late this year or early next, 
according to recent remarks made by 
government officials. The GEM will feature 
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Find your inner 
musician in London. 


more than 50,000 artifacts, including a 
massive statue of Ramesses II and King 
Merneptah’s victory column; two galleries 
dedicated to items discovered in the tomb 
of Tutankhamun; and the Khufu ship, a 
large sailing vessel believed to have been 
intended for use in the afterlife. 


Cairo, Egypt, expected to open between 
November 2023 and February 2024 


Wondrous Space 

The Franklin Institute approaches its 
200th anniversary in 2024 with an 
ambitious reimagination of its perma- 
nent SPACE exhibit. Embracing a new 
era of space science, the revamped 
7000-square-foot exhibit will include a 
two-story gallery that invites visitors to 
explore space technologies, learn how 
space science benefits life on Earth, and 
envision themselves as integral to the 
cosmic journey. 


The Franklin Institute, Philadelphia, PA, USA, 
opens 4 November 2023 


Africa & Byzantium 

The Met's newest exhibition of nearly 
200 works, many never before exhibited 
in the United States, examines the lively 
exchange of Byzantine culture and art 

in North Africa from the 4th to the 15th 
century and beyond. The exhibition's 
wide array of media, from mosaics and 
metalwork to jewelry and manuscripts, 
bring to light the underrepresented his- 
tory of early African Christianity and its 
profound impact on the Byzantine world. 
Focusing on cultural plurality, the exhibi- 
tion reveals Byzantium’s role in shaping 
global arts and ideas in late antiquity. 


The Metropolitan Museum of Art, New York, NY, 
USA, 19 November 2023 to 3 March 2024 


Lady Sapiens 


Reviewed by Katie Harris? 


In April 1966, more than 75 prominent schol- 
ars attended an international symposium 
at the University of Chicago titled “Man the 
Hunter.” That title disclosed the central con- 
ceit for decades of research that followed: 
that men and their big-game hunting were 
the key agents of human evolution. 

In Lady Sapiens, authors Thomas 
Cirotteau, Jennifer Kerner, and Eric Pincas 
undertake an ambitious effort to shed light 
on the underrepresented role of women in 
prehistory. The book challenges assumptions 
about prehistoric gender roles by weaving 
together anthropological evidence, archaeo- 
logical discoveries, and genetic insights to 
present an overview of what life may have 
been like for Upper Paleolithic women. 
Throughout, Lady Sapiens encourages read- 
ers to rethink their understanding of prehis- 
tory and to consider how our conception of 
gender roles and societal dynamics today 
shapes how we interpret the past. 

The authors employ a storytelling style 
that makes complex concepts more accessi- 
ble to a broader audience, potentially spark- 
ing interest among those who are new to the 
topic of human evolution. At times, however, 
this narrative style is also the book’s most 
notable shortcoming. Its heavy reliance on 
excerpts from interviews gives it the feel of 
a “just-so” story, leading the reader to won- 
der whether Lady Sapiens is just the script 
of the 2021 documentary of the same name. 
Likewise, although the authors refer to mul- 
titudes of evidence and research, there are no 
citations in the text, leaving readers to won- 
der how the selected bibliography relates to 
the body of the book. Even when writing for 
a lay audience, referencing sources not only 
lends credibility to the authors’ claims but 
also empowers readers to explore further and 
deepen their understanding of the subject. A 
more generous use of in-text citations would 
have allowed Lady Sapiens to maintain its 
accessibility while boosting its credibility. 

Despite this criticism, Lady Sapiens de- 
serves praise for its willingness to challenge 
the status quo and present an accessible 
alternative view of human prehistory. The 
book is most appropriate for nonexpert read- 
ers but may also be suited for an introductory 
human evolution course if accompanied by 
more in-depth case studies to help students 
understand how the conclusions presented 
in the book were reached. 


Lady Sapiens: Breaking Stereotypes About 
Prehistoric Women, Thomas Cirotteau, Jennifer 
Kerner, and Eric Pincas, Translated by Philippa Hurd, 
Hero, 2023, 240 pp. 
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Ignition 


Reviewed by Jamie Dickman? 


In Ignition, journalist M. R. O’Connor inves- 
tigates catastrophic wildfires, life’s paradoxi- 
cal relationship with fire, and humankind’s 
willingness to engage in risk. For decades, 
megafires have grown in frequency and in- 
tensity, heightening fire management needs. 
O’Connor quickly learns that fire is a contra- 
dictory process that “disturbs and kills and 
midwifes the new into being.” In places like 
North America, where wildfire suppression 
is prioritized, excessive underbrush has ac- 
cumulated. When fires spark here, as they 
did in Canada and the western United States 
this past summer, they have unprecedented 
amounts of fuel to consume, exceeding his- 
torical bounds in severity and temperature. 
Elsewhere, flammable non-native grasses are 
spreading to ecosystems that lack natural fire 
cycling, providing tinder for wildfires that 
were once infrequent in places such as Maui. 

O’Connor becomes trained in fire man- 
agement and joins wildland firefighting 
squads—who often pay heavy prices for their 
dangerous work—battling monstrous blazes 
with a militaristic approach. She meets pre- 
scribed burners and trailblazing ecologists 
who embrace the “fire paradox” and welcome 
the postburn benefits. 

A charred savannah looks dismal to the 
untrained eye, but many plants, animals, and 
insects in fire-dependent ecosystems struggle 
to survive when land managers leave these 
areas unburned. In a fire deficit, nutrient 
cycling is low, food- and shelter-producing 
plants struggle to regenerate, and animals de- 
pendent on these plants and niches dwindle. 

O’Connor untangles the Western idea of 
“wilderness” as pristine, untouched land— 
a perspective that frequently drives fire man- 
agement policy. She points out that when 
Europeans first set foot in North America, the 
majestic and “virgin” landscapes they found 
were intentionally shaped using fire. As early 
as 1634, in the wake of the disease-driven 
deaths of many Native Americans, an over- 
growth of underbrush was already noticeable 
enough to be recorded by settlers. The New 
World was never virgin, one newcomer real- 
ized. Instead, it was “widowed.” 

Many communities in the American South 
have stubbornly continued to execute pre- 
scribed burns despite federal anti-fire pro- 
grams. In the late 1990s, landowners’ right-to- 
burn laws and prescribed burning initiatives 
enabled a pro-fire framework that has helped 
to maintain longleaf pine forests and savan- 
nahs while simultaneously minimizing the 
likelihood of uncontrollable wildfires. 

Efforts to restore historical fire regimes 
are gaining traction, writes O’Connor: “This 


SCIENCE science.org 


movement of fire practitioners is happening 
around the globe and is largely being led by 
Indigenous communities, who speak of their 
right as stewards of the land to once again 
practice ‘cultural burning?” In these endeav- 
ors to reignite a positive relationship with 


fire, she sees hope for the future. 


Ignition: Lighting Fires in a Burning World, 
M. R. O'Connor, Bold Type Books, 2023, 384 pp. 


Nuts and Bolts 


Reviewed by Adam R. Shapiro* 


In the first chapter of Roma Agrawal’s Nuts 
& Bolts, the reader learns that however small 
they may be, nails are far from insignificant. 
They are integral to joinery—a fundamental 
technology upon which so many construc- 
tions and inventions rest—and, in Agrawal’s 
hands, the ability to manufacture, standardize, 
and distribute these small devices becomes a 
tale that connects science and society. 

The nail is one of seven technologies 
Agrawal discusses that together, she claims, 
form the basis of the modern world. The 
others are the wheel, spring, magnet, lens, 
string, and pump. To explain how these tools 
work, she explores optics, electromagnetism, 
and thermodynamics, making connections to 
mechanical force along the way. 

Agrawal is telling a story not just about 
great inventions but also about the societies 
that make and use them and the people who 
are affected by them. The magnet, for ex- 
ample, made long-distance communication 
possible, first by telegraphy, then radio, and 
eventually the internet. Agrawal integrates 
an account of the discovery of principles of 
electrical induction and the refinement of its 
use in communication with a history of the 
first telegraphy cables in India, which were 
used by Britain to exert control over its colo- 
nial claims. 

Like the early chains that added strength 
and flexibility to engineering designs, 
Agrawal has connected many links together 
to bind history to engineering and one part 
of the globe to another. In so doing, she offers 
readers a more global account of inventors 
and builders than is often presented, remind- 
ing them of the benefits that come from at- 
tracting and retaining diverse talent—efforts 
with which many fields of science and engi- 
neering continue to struggle. 

Nuts & Bolts does not require of its read- 
ers in-depth technical knowledge, nor does 
it presume past fluency in global history. 
Historians of technology and other scholars 
might desire more depth—although there is 
a good bibliography of further reading—but 
may find it a fruitful text for teaching intro- 
ductory courses. Ultimately, the book offers a 


robust history that should speak to scientists’ 
and engineers’ sense of social awareness. 


Nuts and Bolts: Seven Small Inventions That 
Changed the World in a Big Way, Roma Agrawal, 
Norton, 2023, 272 pp. 


Your Face Belongs 
to Us 


Reviewed by Ashley Huderson® 


In June 2019, leading a group of collegiate 
scholars across China, I got thirsty while 
touring a university in Yichang and was di- 
rected to a vending machine by a student 
ambassador. Noticing my confusion with 
how to use it, my escort allowed the machine 
to scan her face and selected water from the 
menu. The transaction was both intriguing 
and worrying. 

Facial recognition technologies like the 
one I encountered in Yichang are the focal 
point of journalist Kashmir Hill’s new book, 
Your Face Belongs to Us, which traces the rise 
of Clearview AI, a company that touted in 
2019 that it could identify anyone from one 
snapshot of their face with 99% accuracy. 

Hill’s chilling account traces the tech 


F 


startup’s evolution, from its modest origins * 


to its current status as the creator of a highly 
sought-after software that is used by both 
government and private entities. She fol- 
lows the unlikely pairing of founders Hoan 
Ton-That, an Australian entrepreneur, and 
Richard Schwartz, an American politician, 
who reportedly bonded over their shared 
enthusiasm for Donald Trump at the 2016 
Republican National Convention. Inspired by 


their respect for libertarian venture capitalist : 
Peter Thiel and his call for tools to help “fix” 


America, the pair began their journey into 
facial AI soon thereafter. It would take a few 
years for Clearview AI to materialize, but the 
seed planted in 2016 would eventually lead to 
the development of a globally deployed soft- 
ware that remains hidden in plain sight. 

Hill weaves together tales of her efforts 
to track down Ton-That and Schwartz and 
reports and stories from law enforcement 
agencies and other municipalities that have 
utilized the technology. She even visits the ad- 
dress listed as the company’s office space but 
discovers what appears to be an abandoned 
building. Her inquiries did not go unnoticed. 

In the end, Your Face Belongs to Us show- 
cases the blessing and the curse of technol- 
ogy, probing the age-old inventor’s question: 
“Just because we can, should we?” 


Your Face Belongs to Us: A Secretive Startup’s 
Quest to End Privacy As We Know lt, Kashmir Hill, 
Random House, 2023, 352 pp. 
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Learning to Imagine 


Reviewed by Josh Trapani® 


Playing with my 5-year-old during pandemic 
isolation, I sometimes uncharitably won- 
dered: Why are her made-up games so dread- 
fully boring? In Learning to Imagine, psy- 
chologist Andrew Shtulman calls it a myth 
that children are unbridled fonts of imagina- 
tion. To the contrary, the more we learn, the 
more imaginative we can become. Kids, he 
argues, fail to conceive of obvious possibili- 
ties. They enforce strict rules bound by lim- 
ited knowledge of physical and moral laws 
and prefer imitation to novelty. 

To marshal evidence for this argument, 
Shtulman explores mechanisms for expand- 
ing imagination. Without exposure to the 
testimonies of others or new tools and tech- 
nologies, people may reject plausible ideas 
out of hand, he maintains. Lord Kelvin, for 
instance, famously denied the possibility of 
“heavier-than-air flying machines” less than 
10 years before the Wright brothers created 
one. Abstract principles, like those so influen- 
tial in science and ethics, also bolster imagi- 
nation. For example, sexual selection explains 
traits that other theories—natural selection 
included—cannot. Finally, imagination grows 
through exploring alternative models of the 
world, as in playacting, fiction, or religious 
faith. Across all of these examples, expand- 
ing imagination requires building closely on 
what people already know. 

Perhaps counterintuitively, kids relate best 
to realistic stories. For instance, Walt Disney’s 
earliest cartoons were chaotic and surreal. 
Only when his animations became “plausi- 
bly impossible” did they garner mass appeal. 
Many fictional worlds, from Middle Earth to 
Hogwarts, rely on plausible impossibility. 

Shtulman ably and incisively navigates 
this vast, fascinating terrain. Learning to 
Imagine never drags or gets mired in jargon. 
Allusions to pop culture help: from Santa 
Claus to Elizabeth Holmes, from the Beatles 
to The Princess Bride. I wish, however, that 
there had been more focus on what these 
findings mean. If education helps rather 
than hinders imagination, how do we op- 
timize it? Shtulman advises: “engage with, 
and learn from, the collective knowledge of 
other people.” AI programs like GPT take 
that approach, educating themselves on 
massive datasets. 

“Be like GPT” is not the most heartening 
message. But while people cannot soak up 


_ Fanciful ideas 
need a plausible 
"premise. 


gobs of data like AI can, human imagination 
is communal and collaborative. That, at least, 
is something all of us—5-year-olds and their 
dads alike—have over the chatbots. 


Learning to Imagine: The Science of Discovering 
New Possibilities, Andrew Shtulman, Harvard 
University Press, 2023, 352 pp. 


Of Time and Turtles 


Reviewed by Rhema Bjorkland’ 


While turtles have the largest fan base of 
any reptile group, their shrinking habitats 
are increasingly dissected by development 
and roads, making injuries and death major 
threats. When combined with unintended 
capture in fishing gear, garden-variety 
thoughtlessness, and old-fashioned cruelty, 
the result is that many turtle species are now 
imperiled in the United States and globally. 
In Of Time and Turtles, naturalist Sy 
Montgomery describes her experience as a 
volunteer with the Turtle Rescue League in 
Southbridge, Massachusetts. The book seeks 
to connect the healing experienced by staff 
and volunteers with the recovery of turtles. 
Against the backdrop of her 60th birthday, 
the COVID-19 pandemic, and the political 
and social turmoil of 2020 and 2021, she 
turned to turtles “to show me the path to wis- 
dom and how to make my peace with time.” 
Montgomery accurately describes the 


long and exhausting days that turtle reha- 
bilitation specialists and researchers put in 
during nesting season, a time when turtles 
are most mobile and at risk. Rescuers and 
rehabilitators respond to huge volumes of 
rescue calls during this period and move to 
protect vulnerable nests and eggs. It is im- 
mensely difficult and emotionally exhaust- 
ing for rescue staff and volunteers to see, 
treat, release, and—in many cases—bury 
their turtle charges. 

Montgomery’s documentation of the al- 
most miraculous ability of turtles to recover 
from injuries that would doom most other 
vertebrates is not exaggerated. This resil- 
ience is the foundation for the Turtle Rescue 
League founders’ creed to “Never give up on a 
turtle” and the source of their enduring belief 
in saving them, one shattered shell at a time. 

The book captures the surprisingly spir- 
ited interactions that turtles display with hu- 
mans, and Montgomery sheds light on how 
and why humans who interact with turtles 
often come to feel like the creatures are fam- 
ily. Yet the anthropomorphism she employs 
is at times distracting and ultimately does 
the turtles a disservice. Their chelonian per- 
sistence is sufficiently awe-inspiring without 
characterizations that seek to imbue them 
with human qualities. 


Of Time and Turtles: Mending the World, Shell by 
Shattered Shell, Sy Montgomery, Illustrated by Matt 
Patterson, Mariner, 2023, 304 pp. 
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The Worlds | See 


Reviewed by Kimberly Boller® 


In her new memoir, The Worlds I See, com- 
puter scientist Fei-Fei Li brings human and 
machine capabilities together, interleav- 
ing her personal and scientific paths and 
describing her distinctive contributions 
in the realm of artificial intelligence (AI). 
Throughout the book, Li uses the metaphor 
of finding and following her own “North 
Star” to describe how she has followed her 
instincts throughout her career, asking and 
answering increasingly complex questions 
about the human visual system in her quest 
to advance computer vision. She documents 
what it took to create ImageNet, a large 
database of images that is both the prod- 
uct of her early efforts to train computers 
to identify and categorize objects accurately 
and a dataset that was critical for making 
computer vision a reality. 

Li immigrated to the United States from 
China at the age of 15 and made the life- 
defining choice to reject the allure of big 
money in management consulting after com- 
pleting her undergraduate degree in physics 
at Princeton. Instead, she pursued graduate 
studies at Caltech in engineering and neuro- 
science and later returned to academia after 
a stint at Google. Since 2009, she has served 
on Stanford University’s faculty. None of her 
career decisions came easily, as she weighed 
the potential benefits against the pressure of 
providing for her family and attending to her 
mother’s health needs. 

Central to Li’s account are the relation- 
ships that not only influenced her pursuit 
of science but also nudged her forward and 
instilled confidence. Her support system in- 
cluded a devoted high school math teacher 
and his family, mentors and mentees who re- 
mained close colleagues, and individuals who 
have rotated through her lab over the years. 

At times, Li leans toward a narrative style 
that implies everything fell neatly into place 
throughout her life, resulting in a feel-good 
“science wins” story. But her optimistic at- 
titude about both her career and the future 
of computer science is sincere and therefore 
understandable. Ultimately, her commitment 
to and belief in the just, ethical growth and 
deployment of AI to support human flourish- 
ing is exactly what is needed now. 


The viewpoints expressed in this review are the author’s own 
and do not represent an endorsement from or the policy or 
viewpoints of the American Psychological Association. 


The Worlds | See: Curiosity, Exploration, and 
Discovery at the Dawn of Al, Fei-Fei Li, Flatiron, 
2023, 336 pp. 
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Periodical 

This documentary challenges centuries- 
old societal stigmas surrounding the 
menstrual cycle, delving into the sci- 
ence and politics of menstruation and 
highlighting the evolving perceptions 

of periods in popular culture. Featuring 
interviews with physicians, stem cell 
researchers, athletes, journalists, and ac- 
tivists fighting against period taxes, the 
film rejects taboos and seeks to foster 
acceptance of, and greater interest in, 
this underexplored phenomenon. 


Lina Lyte Plioplyte, director, XTR and 
MSNBC Films, airs 19 November 2023 


Starfield 

Set in the year 2330, this highly an- 
ticipated video game invites players to 
embark on an epic journey as part of 
“Constellation,” a group of space explor- 
ers seeking rare artifacts throughout 
the Galaxy. Drawing inspiration from 
real space shuttle design and planetary 
phenomena, the game allows players 
to explore more than 1000 planets 
while navigating alien environments, 
building outposts, and advancing re- 
search projects. 


Bethesda Game Studios, releases 
6 September 2023 


Pain Hustlers 

In his 2022 book, The Hard Sell, journal- 
ist Evan Hughes documented the rise 
and fall of biotech startup Insys Thera- 
peutics, whose aggressive opioid sales 
practices led to a landmark criminal trial. 
In this dramatized version of Hughes's 
tome, Emily Blunt portrays Liza Drake, 

a single parent struggling to make ends 


meet who lands a job as a pharmaceuti- 
cal sales rep and becomes entangled ina 
racketeering scheme. The harsh realities 
of the company’s actions force Drake to 
weigh the ethics of her choices. 


David Yates, director, Netflix, releases 
27 October 2023 


Lessons in Chemistry 

Based on Bonnie Garmus'’s best-selling 
book of the same name, this series follows 
the fictional chemist Elizabeth Zott, who 

is fired from an all-male research facility in 
the 1950s and reluctantly agrees to host 

a televised cooking show. Her scientific 
approach to cooking captures the eyes of 
the nation’s housewives, but social ten- : 
sions rise along with her show's popularity 
as she dares viewers to challenge the 
status quo. 


Lee Eisenberg, showrunner, Apple TV+, 
first episode premieres 13 October 2023 


Life on Our Planet 

The biodiversity we see on Earth today is 
a fraction of life's potential: An estimated 
99% of the planet's inhabitants have gone 
extinct. This eight-episode docuseries, 
narrated by Morgan Freeman, tells the 
story of life's fight for survival: starting 
with the first cell, unfolding through five 
mass extinction events, and ending with 
the sixth such event, which our planet 
faces today. The series brings fossilized 
organisms back to life on-screen, show- 
casing unique evolutionary adaptations 
while exploring ancient ecosystems. ¢ 


Dan Tapster, Keith Scholey, and Alastair 
Fothergill, series producers, Netflix, releases 
25 October 2023 


Extinct creatures 
come alive in 
Life on Our Planet. 
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The role of feral goats 
in Maui fires 


The fire that destroyed the city of Lahaina 
on the Hawaiian island of Maui has killed 
more people than any other US wildfire in 
the past 100 years (1). In addition to poor 
land-use decisions, which facilitated the 
proliferation of combustible invasive species 
(2, 3), the effects of feral goat grazing on the 
landscape (4) played a central role in fueling 
the blaze. Removing invasives and restoring 
native plants is crucial to fire suppression 
efforts. To achieve these goals, Hawai‘i must 
better control the feral goat population. 

Feral ungulates, especially goats, have 
played a substantial role in the disappear- 
ance of Hawai‘i’s native dry forest ecosystem. 
First introduced in 1789 as a gift to King 
Kamehameha I (5), goats later escaped 
domestication. Feral goats are notoriously 
destructive, consuming native plants and 
stripping away the bark from native trees 
(6, 7). The damage goats have done to 
native species, combined with previous fires 
and the end of Maui's plantation industry, 
has left large swaths of land open for the 
spread of invasive fire-adapted species (2, 3), 
creating a cycle of ever-increasing fires and 
opportunities for invasive species. 

Intensive cattle grazing, one fire preven- 
tion option, would reduce the quantity of 
invasive species that serve as fuel for fires 
(8). However, this solution would require 
perpetual management and water alloca- 
tion (9) and would undermine the global 
push for reforestation (J0) and carbon 
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sequestration. Furthermore, grazing inten- 
sive enough to suppress fire would increase 
the risk of erosion and flooding during 
winter rains, leave a barren landscape, and 
fail to support Hawai‘i’s native biota, itself a 
critical Hawaiian biocultural resource. 
Restoring Hawaiian dry forest as a 
biocultural resource is a more sustain- 
able solution. Native Hawaiian dry forests 
burn less quickly than invasive shrubby 
grasslands (11), and plant communities 
with more canopy layers and deeper root 
systems prevent erosion and flooding. The 
ongoing restoration of native vegetation in 
areas such as the Hawaiian cultural reserve 
on Kaho‘olawe could not succeed without 
eradicating goats (72). Protecting native 
species from goats will allow them to regen- 
erate, giving them a chance to replace the 
invasive species that benefit from the goats’ 
presence. To suppress and slow landscape- 
scale fires, communities should restrict 
feral goats to areas far from population cen- 
ters and implement landscape-scale native 
Hawaiian dry forest restoration. 
Daniel Rubinoff'* and Samuel M. ‘Ohukani‘ohi‘a 


Gon Ill? 

1Department of Plant and Environmental 
Protection Sciences, University of Hawai'i, 
Honolulu, HI, USA. The Nature Conservancy of 
Hawai'i, Honolulu, HI, USA. 
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Australia’s path to 30% 
conservation by 2030 


Member nations of the Convention on 
Biological Diversity (CBD) committed to 17 
conservation targets (7), including the con- 
servation of 30% of land and water by 2030 
(“30x30”), and to the investment of US$200 
billion in annual funding, 85% for domestic 
measures. What they can deliver, however, 
depends on the complexities of domestic 
politics in different countries. As the con- 
straints in Australia demonstrate, generic 
guidelines (2) are useful but insufficient. 

Australia has high biodiversity and 
inadequate conservation (3). The national 
government collects taxes and distributes 
funds to states, but the state governments 
control most land, including forests, farm- 
land, and national parks. About 22% of 
Australia’s total land area of 770 million 
ha is currently protected (4); the coun- 
try needs to protect another 62 million 
ha to meet the 30% goal. The national 
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government’s CBD commitment is not 
enough to purchase sufficient land at 
market rates (5). 

Because CBD targets are the responsibil- 
ity of the national government, and pur- 
chasing the land is not feasible, alternative 
approaches will be required. The govern- 
ment should create more offshore marine 
reserves, under national control (6). It 
should also provide cash grants to conser- 
vation trusts and nongovernmental organi- 
zations with a track record in establishing 
new reserves, such as BirdLife Australia 
and the Australian Wildlife Conservancy 
(7). Many Indigenous land use agree- 
ments already exist (8), with conservation 
benefits; the national government should 
initiate more. Legal and financial nego- 
tiations between national and state gov- 
ernments, using earmarked funds along 
with constitutional powers derived from 
international agreements, could expand 
national parks and remove private tourism 
accommodation, halt farmland vegetation 
clearance and old-growth logging, and 
declare new national forest reserves. 

Conservation covenants on private land 
are widespread in other countries (9), and 
Australia’s national government should emu- 
late them. The Nature Repair Market Bill 
2023 (10) attempts to do so, but poorly. The 
potential law, which has not been passed, 
sidesteps state government control of land 
by establishing biodiversity certificates 
separate from land titles. However, it lacks a 
financial driver—neither direct funding nor 
tax incentives are included in the proposal 
(11). Because the bill risks abuse of offsets 
(72), conservation scientists oppose it (10). 

Conservation policy experts can design 
legal and financial instruments for each of 
the mechanisms open to Australia to fulfill 
its commitments. Other countries negotiat- 
ing their own domestic politics will need to 
conduct a similar process. 

Ralf Buckley 


Griffith University, Gold Coast, QLD 4222, 
Australia. Email: rbuckley@griffith.edu.au 
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Enhance China’s urban 
flood resilience 


Between 29 July and 1 August, Beijing expe- 
rienced its heaviest rainfall in at least 140 
years, resulting in severe urban flooding 
that caused 33 fatalities, substantial eco- 
nomic losses, and environmental damage 
(1). Urbanization (2) and climate change (3) 
have increased the frequency and intensity 
of urban floods, which have caused damage 
in other Chinese cities as well (4). Urban 
flood events, and the population exposed to 
floods, are expected to increase (5). China’s 
14th Five-Year Plan (2021-2025) emphasizes 
the importance of urban flooding resilience 
(6) but provides insufficient details about 
how to implement the changes. To prevent 
casualties and damage, cities in China must 
implement the goals outlined in the plan 
by enhancing the flood resilience of urban 
infrastructure. 

China’s unprecedented urbanization has 
led to a substantial increase in impervious 
surface areas (2), impeding groundwater 
absorption and amplifying surface runoff. 
Since 2014, the national government has 
proposed the construction of “sponge cit- 
ies” (7), which use natural solutions such as 
green spaces and waterways to help absorb 
water. Some new construction in wealthy 
areas has incorporated these strategies, but 
older neighborhoods and smaller cities and 
towns have been neglected (8). Populations 
in mountainous areas are more vulnerable 
to flood disasters than those in many big 
cities, yet old infrastructure remains (9). 

All Chinese cities, not just the biggest and 
wealthiest, need improved urban drainage 
and flood control systems. Urban drainage 
pipeline networks should be constructed to 
carry excess water away from population 
areas (10), and natural waterways that con- 
nect the city to surrounding rivers and lakes 


should be preserved. Natural flood channels 
and green spaces that absorb water should 
be integrated into city plans (71), and their 
location and capacity should be determined 
based on the average volume of rainfall that 
each area has received in the past and is 
likely to receive in the future. Plans should 
take into account local conditions and allow 
for levees, seawalls, and other fortification 
as needed. 

Effective flood control infrastructure, 
whether new construction or retrofit- 
ting, requires substantial investment. The 
national government has budgeted about 7 
billion yuan (US$964 million) per year for 
resilience projects (12), but between 2019 
and 2021, one province in central China 
alone required 75 billion yuan (US$10.3 
billion), representing a massive funding 
gap (12). Therefore, China should adopt a 
diversified financing structure, in which the 
national government as well as provincial 
and city governments earmark money for 
this purpose. Public-private partnerships 
can build the social capital required to 
invest in flood resilience projects. In addi- 
tion, flood control taxation should be initi- 
ated at all government levels. To use the 
funds most effectively, the national govern- 
ment, with regional government coopera- 
tion, should conduct city-level mapping 
and then dispense funds to the areas that 
require the most urgent action. Investing 
now to prevent future flooding will mini- 
mize the expense of repairing flood damage, 
as well as save lives. 
Longwu Liang, Mingxing Chen*, Jiafan Cheng 
Institute of Geographic Sciences and Natural 
Resources Research, Chinese Academy of 
Sciences, Beijing, China and College of Resources 
and Environment, University of Chinese Academy 
of Sciences, Beijing, China. 
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MINING IMPACTS 
Downstream effects of metal mines 


ining for metals produces waste containing toxic elements such as mercury and arsenic. 
Macklin et al. compiled global data on the locations of active and inactive metal mines 
and tailings dams, which hold mine waste. Using hydrologic models, they assessed river 
system contamination from mines and failed tailings dams and determined the flood- 
plains, people, and livestock that could be affected. Over 23 million people live on the 
~164,000 square kilometers of floodplains affected by mining. Although tailings dam failures 


have massive local impacts, they are estimated to affect far fewer people than baseline contami- 


nation from current or past mining activities. Increased global data and monitoring are needed 
to fully understand the ecological and health impacts of this extractive industry. —BEL 


Science, adg6704, this issue p. 1345 


The Rio Tinto in Spain and the surrounding basin and groundwater are highly polluted 
by mine waste from millennia of metal ore extraction. 


Asource of CO, 
within Europa 


Europa, an icy moon of Jupiter, 
has a subsurface ocean beneath 
acrust of water ice. Solid carbon 
dioxide (CO,) has previously been 
observed on its surface, but the 
source was unknown. Two teams 
analyzed infrared spectroscopy 
of Europa from the James Webb 
Space Telescope to investigate the 
CO, source. Trumbo and Brown 
found that the CO, is concen- 
trated in a region with geology that 
indicates transport of material to 
the surface from within the moon, 
and they discuss the implications 
for the composition of Europa’s : 
internal ocean. Villanueva et al. 
also identified an internal origin of 
the CO, and measured its #C/8C 
isotope ratio. They searched for 
plumes of volatile material breach- 
ing the surface but found a lower 
activity than earlier observations. 
Together, these studies demon- 
strate that there is a source of 
carbon within Europa, probably in 
its ocean. —KTS 

Science, adg4270, adg4155, 

this issue p. 1305, p.1308 


Targeted axonal 
regeneration 


Although several experimental 
approaches have shown positive : 
results in axonal regeneration after 
spinal cord injury (SCI), complete 
recovery of motor functions 
remains an elusive target. Squair 
et al. hypothesized that restoration 
of complete axonal projection of 
a selected neuronal population to 
their natural target could promote 
better functional recovery. After 
using single-cell RNA sequencing 
to identify the most promising 
neuronal population, the authors 
showed that promoting axonal 
growth and path guidance to their 
natural target in this population 
restored walking in mice after 
complete SCI. By contrast, broad 
axonal restoration across the 
lesion had no effect, suggesting 
that a more targeted approach is 
necessary for functional recovery 
after SC]. -MMa 

Science, adi6412, this issue p. 1338 


rk 
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Regulation of epigenetic 
silencing by RNA 
In addition to serving as a 
messenger, RNA regulates the 
process of transcription itself. 
Polycomb repressive com- 
plex 2 (PRC2), an epigenetic 
gene-silencing complex that 
is essential for cell differentia- 
tion and is often deregulated 
in cancers, is a prominent 
system for studying this RNA- 
mediated regulation. Song et 
al. used cryo-electron micros- 
copy to solve a structure of 
PRC2 bound to RNA, revealing 
that the RNA promotes dimer- 
ization of the protein complex. 
The domain of the catalytic 
subunit of PRC2 needed for 
binding nucleosomal DNA 
and engaging with its histone 
substrate is sequestered in the 
PRC2 dimer interface, explain- 
ing how RNA inhibits PRC2 
activity. —DJ 

Science, adhOO59, this issue p. 1331 


Shoving out a 
twisted molecule 


Chemists often strive to push 
reactions metaphorically uphill 
toward less energetically favor- 
able products. The challenge 
is to keep those products from 
rolling right back down. Gemen 
et al. report a clever tactic for 
twisting azobenzene into its 
higher-energy Z conformation. 
Specifically, they lured the more 
stable E isomer into a supra- 
molecular host, along with a 
photosensitizer. When visible 
light injects energy to induce 
the twist, the Z isomer no longer 
fits in the cavity, so it gets 
pushed out before more light 
can twist it back. —JSY 

Science, adh9059, this issue p. 1357 


Reproducible 
preclinical science 


Irreproducible scientific stud- 
ies hamper clinical translation, 
and many researchers have 
called for more rigorous and 
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unbiased preclinical tri- 
als. Lyden et al. present the 
outcome of a randomized 
and blinded preclinical trial to 
assess stroke interventions 
that was conducted by the 
Stroke Preclinical Assessment 
Network. A total of 2615 mice 
and rats were enrolled across 
six research laboratories 
that adhered to standardized 
protocols. Using a multiarm, 
multistage design, the authors 
tested six treatment candi- 
dates in four stages comprising 
different rodent models of isch- 
emic stroke. Interventions that 
did not meet statistical criteria 
were eliminated after each 
stage. Only one intervention, 
uric acid, exceeded the efficacy 
threshold at the last stage of 
the trial. These results show 
that standardized preclinical 
trials across multiple laborato- 
ries are feasible. —DN 
Sci. Transl. Med. (2023) 
10.1126/scitranslmed.adg8656 


Sequestering carbon 
in ancient Amazonia 


The origin of so-called “dark 
earths,” highly fertile soils 
located near many ancient and 
modern agricultural settlements 
in Amazonia, has been debated 
for decades. Two hypotheses 
for their origin include the 
deliberate enrichment of low- 
fertility tropical soils through 
the deposit of household food 
waste (the midden model) or 
their unintentional creation 
through cultivation itself. Ina 
comparative study of dark soils 
associated with archaeologi- 
cal sites and modern villages 
occupied by Indigenous peoples 
in southeastern Amazonia, 
Schmidt et al. show that there 
is strong evidence for the mid- 
den hypothesis. They further 
demonstrate that these soils are 
excellent carbon sinks and are 
an important element of sus- 
tainable agricultural practices 
that may also help to mitigate 
anthropogenic climate change. 
—MSA 
Sci. Adv. (2023) 
10.1126/sciadv.adh8499 
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Hyperendemic epidemic 
Despite general access to free 
drug treatment, Botswana still 
has one of the highest preva- 
lences of HIV worldwide, which 
is concentrated in very young 
women. Because of diamond 
mining, this country has urban- 
ized rapidly during the past 40 
years, generating a highly mobile 
and connected population. Song 
et al. wanted to know how much 
human population mobility 
contributes to sustaining HIV 
epidemics. Network analysis of 
microcensus data between 1981 
and 2011 showed that 10% of 
the population, both male and 
female and mostly between 
16 and 20 years of age, moved 
annually. Phylogenetic studies 
showed a wide distribution of 
HIV lineages across the country, 
attesting to mobility being a 
major epidemic driver. —CA 
eLife (2023) 10.7554/eLife.85435 
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Magnetic tissue 
development 
In nature, the development of 
complex, three-dimensional 
structures of tissues is guided 
by changes in mechanical force 
on individual cells. However, 
growing tissue in vitro in the 
form of organoids must rely 
either on self-organization or 
on the introduction of extrinsic 
forces such as those gener- 
ated by synthetic extracellular 
matrices. By labeling clusters 
of human pluripotent stem cells 
within organoids with magnetic 
nanoparticles, Fattah et a/. were 
able to use a magnetic field to 
impose local mechanical forces 
on both the magnetically tagged 
and nontagged cells in the 
organoid. This targeted force 
then guided asymmetric growth 
of the organoid. —SAL 
Nat. Commun. (2023) 
10.1038/s41467-023-41037-8 
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BATTERIES 
Understanding the role of 
mechanics 


Replacing a liquid electrolyte 
with a solid one has the potential 
to improve the capacity and 
safety of lithium metal batteries. 
Although the focus has been on 
the electrochemical behavior, 
internal stresses and strains can 
also substantially alter battery 
performance and lifetime. 
Kalnaus et al. reviewed our 
understanding of the mechanics 
of solid-state batteries and the 
effect of having multiple solid- 
solid interfaces. They also looked 
at ways to alleviate stresses 
through additional materials and 
designs to improve the lifetime 
and performance of these bat- 
teries. —MSL 
Science, abg5998, this issue p. 1300 


MOLECULAR BIOLOGY 
Initiating DNA replication 
in animals 


Chromosomes are copied only 
once during each cell division to 
allow inheritance of the genetic 
blueprint by the next generation. 
Until now, our understanding of 
how cells assemble the molecu- 
ar machinery for chromosome 
duplication has relied heavily 

on studies of yeast. AlphaFold- 
Multimer (AF-M) is an artificial 
intelligence-based system that 
can predict whether proteins 
interact. Lim et al. applied this 
technology to address the 
function of an enigmatic protein 
called DONSON, which was 
previously implicated in DNA 
replication. They used AF-M to 
predict which of the 20,000 
proteins expressed in human 
cells bind DONSON. Combined 
with experimental validation, 
this in silico screen indicated 
that DONSON is the missing link 
in the assembly of the complex 
that unzips the DNA double helix 
during replication. Working in 
the roundworm Caenorhabditis 
elegans, Xia et al. showed that 
the worm homolog of DONSON, 
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DNSN-1, is essential for the 
assembly of the replication 
machinery. Homologs of DNSN-1 
are absent from yeast but 
present in plants and animals. 
Mutations of the equivalent 
protein in humans causes a form 
of microcephalic dwarfism often 
seen when assembly of the chro- 
mosome duplication machinery 
is defective. —DJ 

Science, adi3448, adi4932, 

this issue p. 1301, p. 1302 


MACHINE LEARNING 
Missense effects 
predicted by AlphaFold 


Single—amino acid changes in 
proteins sometimes have little 
effect but can often lead to prob- 
lems in protein folding, activity, or 
stability. Only a small fraction of 
variants have been experimentally 
investigated, but there are vast 
amounts of biological sequence 
data that are suitable for use 
as training data for machine 
learning approaches. Cheng et 
al. developed AlphaMissense, 
a deep learning model that 
builds on the protein structure 
prediction tool AlphaFold2 (see 
the Perspective by Marsh and 
Teichmann). The model is trained 
on population frequency data 
and uses sequence and predicted 
structural context, all of which 
contribute to its performance. 
The authors evaluated the 
model against related methods 
using clinical databases not 
included in the training and 
demonstrated agreement with 
multiplexed assays of variant 
effect. Predictions for all single— 
amino acid substitutions in the 
human proteome are provided as 
a community resource. —MAF 
Science, adg7492, this issue p. 1303; 
see also adj8672, p. 1284 


MOLECULAR BIOLOGY 
Short but mighty tandem 


repeats 


Short tandem repeats (STRs) 
are common within regulatory 
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elements in eukaryotic 
genomes. Although changes 
in STR lengths often correlate 
with altered transcription, the 
mechanism by which they tune 
gene expression has remained 
mysterious. Horton et al. show 
that many transcription factor 
(TF) proteins directly bind STRs 
and that TF-preferred STRs 
need not resemble known bind- 
ing sites (See the Perspective 
by Kuhlman). This binding can 
be explained and predicted 
by simple additive models in 
which repeated instances of 
low-affinity binding sites sum to 
have large effects. These find- 
ings suggest that STRs provide 
a regulatory mechanism to tune 
levels of TF binding and down- 
stream gene expression. —DJ 
Science, add1250, this issue p. 1304; 
see also adk2055, p. 1289 


PREGNANCY 
Cell swap for protection 


During pregnancy, some 
exchange of cells occurs 
between the mother and 
the offspring, with fetal cells 
persisting in small quantities 
in the mother (microchimeric 
cells) and, conversely, maternal 
microchimeric cells in the fetus. 
These cells help to promote 
the expansion of T regulatory 
cells that blunt the immune 
response and ensure that the 
mother’s immune system does 
not reject the fetus. Shao et 
al. tracked the persistence of 
maternal and fetal microchi- 
meric cells in mouse models 
in multiple pregnancies sired 
by immunologically different 
fathers (see the Perspective by 
Porrett). The authors deter- 
mined that fetal cells from a 
previous pregnancy persist 
in subsequent ones, whereas 
maternal microchimeric cells 
present in a daughter mouse 
get replaced by offspring cells 
when the daughter mouse 
becomes pregnant. —YN 
Science, adf9325, this issue p. 1324; 
see also adk1218, p.1286 


CARTELS 
Controlling violence by 
curbing recruitment 


Homicides in Latin America are 
driven by violent cartels. The 
impact of Mexican cartels is 
especially far-reaching because 
they prey upon undocumented 
migrants along the US-Mexico 
border, violate human rights, and 
weaken political and economic 
institutions. However, cartels 
remain mysterious despite being 
a major employer. Because 
understanding how Mexican 
cartels function is essential 
to attenuating their power, 
Prieto-Curiel et al. conducted 
a sophisticated analysis that 
estimated their population size 
and examined factors driving 
cartel growth and shrinkage (see 
the Policy Forum by Caulkins et 
al.). Factors included “recruit- 
ment” (new cartel members 
join), “incapacitation” (police 
incarcerate or arrest members), 
“conflict” (cartels fight other car- 
tels), and “saturation” (members 
leave). Findings suggest that 
reducing “recruitment” instead 
of increasing “incapacitation” is 
a much more effective policy to 
decrease violence. —EEU 
Science, adh2888, this issue p. 1312; 
see also adj8911, p. 1291 


MEMBRANES 
Gas separation using 
mixed matrix membranes 


Nanoporous crystalline materi- 
als, represented by zeolites 

and metal-organic frameworks 
(MOFs), naturally contain con- 
tinuous pore systems that can 
enable the separation of gases, 
but it is difficult to process them 
into robust, large sheets. Chen 
et al. developed a solid-solvent 
technique for making thin, highly 
loaded, and defect-free mixed 
matrix membranes (see the 
Perspective by Yang and Zhao). 
Precursor metal salts are dis- 
solved into a polymer and then 
converted into a MOF material. 
The authors demonstrate the 
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separation of hydrogen and 
carbon dioxide, with high perme- 

ance and selectivity. -MSL 
Science, adi1545, this issue p. 1350; 
see also adk1794, p. 1288 


CANCER IMMUNOLOGY 
Sparking T cell response 


The mitochondrial electron 
transport chain (ETC) is required 
for T cell immune responses 
to antigens. Mangalhara et al. 
explored whether modulating 
the mitochondrial ETC could 
influence cancer growth and 
tumor immunity (see the 
Perspective by Pouikli and 
Frezza). Increasing electron flow 
through mitochondrial complex | 
elevated succinate levels and led 
to transcriptional and epigenetic 
activation of major histocompat- 
ability complex class | (MHC-1) 
and antigen presentation and 
processing genes. Induction of 
MHC-| occurred independently 
of the cytokine interferon- 
gamma and resulted in potent 
T cell responses to melanoma, 
suggesting an approach for 
improving tumor immunogenic- 
ity. —PNK 

Science, abq1053, this issue p. 1316; 

see also adk1785, p. 1287 


TUMOR IMMUNOLOGY 
Regulatory T cells of 
distinction 


Blocking the suppressive 
activity of CD4* regulatory T 
cells (T,,,.) to reinvigorate the 
immune system against tumors 
comes with substantial risks 

of adverse effects because of 
their critical role in immune 
tolerance. Shan et al. identified 
a subpopulation of T,,,, enriched 
in the tumor microenvironment 
specifically by comparing CD4* 
T cells from healthy individuals 
with those from patients with 
head and neck squamous cell 
carcinomas. Characterized by 
gene expression programs and 
suppressor function under the 
control of the transcription 
factor BATF, these distinct T,,.. 
were associated with poorer 
outcomes across a variety of 
cancers. It may thus be possible 
to inhibit tumor-infiltrating 
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Thegs by targeting the distinct 
gene-regulatory networks that 
control their suppressor func- 
tion without impairing immune 
homeostasis in general. —SHR 
Sci. Immunol. (2023) 
10.1126/sciimmunol.adf6717 


CANCER 
Alternative splicing for 
invadopodia 


Different transcripts can be 
produced from the same gene 
through alternative splicing. 
Li et al. found that colorectal 
cancer cells formed metastasis- 
associated structures called 
invadopodia that are caused by 
alternative splicing. Signaling 
from the tumor stroma induced 
the expression of a long noncod- 
ing RNA in colorectal cancer 
cells. The long noncoding RNA 
altered the way that a splicing 
factor acted on a target pre- 
messenger RNA, resulting in 
more of the longer transcript and 
less of the shorter, tumor-sup- 
pressive transcript. This switch 
enabled invadopodia formation 
and invasion in colorectal cancer 
cells. —LKF 
Sci. Signal. (2023) 
10.1126/scisignal.adh4210 
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MACHINE LEARNING 
Watermarks for large 


language models 


Large language models (LLMs) 
are becoming an increasingly 
powerful tool for many appli- 
cations, yet serious concerns 
remain about how to detect 
LLM-generated synthetic text 
and avoid plagiarism risks such 
as cheating in answering ques- 
tions, academic writing, and 
computer science assignments. 
Kirchenbauer et al. propose a 
simple method for watermark- 
ing LLM output (invisible data 
modifications that hide identify- 
ing information but make the 
text recognizable as synthetic) 
that requires no usage of the 
anguage model or knowledge 
of its parameters to decode the 
watermarks. Although there are 
many open questions regarding 
its practical implementation, 
reliability, and security, the 
proposed method could affect 
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Grafted tomato and 
eggplant, or eggmato 


PLANT SCIENCE 
Grafting for success 


rafting plant roots with the 
shoots of another plant is 
common in many agronomic 
settings. This technique can 
allow combinations of geno- 
types with beneficial traits to be 
combined, for example, by graft- 
ing disease-resistant roots with 
high-yield shoots. However, only 
certain combinations of species and 
genotypes are compatible. Thomas 
et al. examined graft compatibility 
between four Solanaceae species. 
The authors found that whereas 
most interspecies combinations 
could survive for up to 30 days, only 
tomato and eggplant grafts showed 
genuine compatibility with each 
other and formed robust vascular 
connections. Groundcherry and 
pepper were only compatible when 
self-grafted. Tomato and eggplant 
are more closely related to each 
other than the other species, sug- 


gesting that phylogenetic constraints 


limit graft compatibility within the 
Solanaceae. —-MRS 
J. Exp. Bot. (2023) 10.1093/jxb/erad155 


how we think about and operate 
with LLMs. —YS 
PMLR (2023) 
https://openreview.net/ 
forum?id=aX8ig9X2a7 


CANCER 
Macrophage targeting 


in brain cancer 


Glioblastoma (GBM) is an 
aggressive brain tumor that 
has limited treatment options. 


Targeting a histone-modifying enzyme overcomes resistance to immune checkpoint 
therapy in mouse brain tumors. 


GBM tumors frequently harbor 
certain types of myeloid cells, 
immune cells that can suppress 
beneficial immune-based thera- 
pies. Goswami et al. performed 
single-cell and spatial transcrip- 
tomic analyses of human GBM 
tumors with the goal of identify- 
ing new therapeutic strategies. 
After examining the epigenetic 
landscape of brain tumors, the 
enzyme histone h3 lysine 27 
demethylase (KDM6B) was 
found to be overexpressed in 
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GBM-associated human myeloid 
cells. Using a model system, 
genetic inactivation, or pharma- 
cological inhibition of KDM6B 
enhanced proinflammatory 
pathways, reduced the growth 
of experimental tumors, and 
improved sensitivity to anti-PD-1 
immunotherapy. —PNK 
Nat. Cancer (2023) 
10.1038/s43018-023-00620-0 


SOCIOLOGY 
Identity theft: 
Who’s to blame? 


As more organizations collect 
and use more personal data, 
financial identity theft becomes 
an increasingly important 
source of economic insecurity, 
responses to which are shaped 
by social structures. Brensinger’s 
study of US identify theft victims 
showed that middle- and upper- 
income people and whites tend to 
blame organizations, from which 
they demand improved protec- 
tion, but low-income people and 
people of color tend to be suspi- 
cious of their personal networks 
and to end relationships as a 
result. These behavioral differ- 
ences may lead those who most 
need support to cut themselves 
off from it and may exacerbate 
disparities in organizational 
accountability. —BW 
Am. Soc. Rev. (2023) 
10.1177/00031224231189895 


METALLURGY 


Resisting fatigue 
Metal alloys often fail due to 
fatigue, which occurs when 
cracks nucleate and grow after 
cyclic loading. Dan et al. created a 
fatigue-resistant aluminum alloy 
by using additive manufacturing 
to create a dual-phase cellular 
microstructure. The microstruc- 
ture boosts fatigue resistance by 
a factor of about two compared 
with other aluminum alloys made 
by additive manufacturing. The 
dual-phase cellular microstruc- 
ture helps to confine damage by 
acting as a dislocation barrier 
and represents a more general 
strategy for other alloys. —BG 
Nat. Mater. (2023) 
10.1038/s41563-023-01651-9 
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BATTERIES 


Solid-state batteries: The critical role of mechanics 


Sergiy Kalnaus*, Nancy J. Dudney, Andrew S. Westover, Erik Herbert, Steve Hackney 


BACKGROUND: Solid-state batteries (SSBs) have 
important potential advantages over tradi- 
tional Li-ion batteries used in everyday phones 
and electric vehicles. Among these potential 
advantages is higher energy density and faster 
charging. A solid electrolyte separator may 
also provide a longer lifetime, wider operating 
temperature, and increased safety due to the 
absence of flammable organic solvents. One of 
the critical aspects of SSBs is the stress re- 
sponse of their microstructure to dimensional 
changes (strains) driven by mass transport. 
The compositional strains in cathode particles 
occur in liquid electrolyte batteries too, but in 
SSBs these strains lead to contact mechanics 


problems between expanding or contracting 
electrode particles and solid electrolyte. On the 
anode side, plating of lithium metal creates its 
own complex stress state at the interface with 
the solid electrolyte. A critical feature of SSBs is 
that such plating can occur not only at the 
electrode-electrolyte interface but within the 
solid electrolyte itself, inside its pores or along 
the grain boundaries. Such confined lithium 
deposition creates areas with high hydrostatic 
stress capable of initiating fractures in the 
electrolyte. Although the majority of failures 
in SSBs are driven by mechanics, most of the 
research has been dedicated to improving ion 
transport and electrochemical stability of elec- 


Solid Electrolyte 


Control of Li Metal 


Electrolyte Microstructure 


The promise of solid-state batteries. SSBs offer a variety of multifunctional and safe solutions if important 
breakthroughs are made in engineering cell components and eliminating the need for tremendous external 


pressure to keep interfaces intact. 


Kalnaus et al., Science 381, 1300 (2023) 
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trolytes. As an attempt to bridge this gap, in te 
review we present a mechanics framev--—- 
for SSBs and examine leading research in the 
field, focusing on the mechanisms by which 


stress is generated, prevented, and relieved. 


ADVANCES: The push toward renewable re- 
sources requires the development of next- 
generation batteries with energy densities 
more than double that of current batteries and 
that can charge in 5 min or less. This has led to 
a race to develop electrolytes that can both 
facilitate 5-min fast charging and enable Li 
metal anodes—the key to high energy. The 
discovery of solid electrolytes that have high 
electrochemical stability with Li metal and 
sulfide solid electrolytes with ionic conduc- 
tivities greater than those of any liquid elec- 
trolyte have spurred a shift in the research 
community toward SSBs. Although these dis- _ 
coveries have seeded the promise that SSBs 
can enable the vision of fast charging and a 
doubling of energy density, realization of this 
promise is feasible only if the mechanical be- 
havior of battery materials is thoroughly under- 
stood and multiscale mechanics is integrated 
in the development of SSBs. . 


OUTLOOK: Several key challenges must be ad- 
dressed, including (i) nonuniform lithium plat- 
ing on a solid electrolyte surface and deposition * 
of lithium metal within the solid electrolyte; 
(ii) loss of interfacial contact within the cell 
as a result of the volume changes associated 
with the electrochemical cycling that occurs 
at electrode contacts and also at grain bound- 
aries; and (iii) manufacturing processes to 
form SSBs with a very thin solid electrolyte 
and a minimum of inactive components, in- 
cluding binders and structural supports. Me- 
chanics is a common denominator connecting - 
these problems. Deposition of metallic lith- ‘ 
ium into the surface and volume defects of a 
ceramic solid electrolyte results in local high 
stresses that can lead to electrolyte fracture - 
with further propagation of metallic lithium 
into the cracks. In manufacturing, as a mini- 
mum requirement, the cathode-electrolyte 
stacks should possess enough strength to with- 
stand the forces applied by the equipment. A 
better understanding of the mechanics of SSB 
materials will transfer to the development of 
solid electrolytes, cathodes, anodes, and cell 
architectures, as well as battery packs de- 
signed to manage the stresses of battery man- 
ufacturing and operation. ® 
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*Corresponding author. Email: kalnauss@ornl.gov 

Cite this article as S. Kalnaus et al., Science 381, eabg5998 
(2023). DOI: 10.1126/science.abg5998 


S READ THE FULL ARTICLE AT 
https://doi.org/10.1126/science.abg5998 


lof1 


RESEARCH 


BATTERIES 


Solid-state batteries: The critical role of mechanics 


Sergiy Kalnaus', Nancy J. Dudney”+, Andrew S. Westover’, Erik Herbert’, Steve Hackney* 


Solid-state batteries with lithium metal anodes have the potential for higher energy density, longer lifetime, 
wider operating temperature, and increased safety. Although the bulk of the research has focused on 
improving transport kinetics and electrochemical stability of the materials and interfaces, there are also 
critical challenges that require investigation of the mechanics of materials. In batteries with solid-solid 
interfaces, mechanical contacts, and the development of stresses during operation of the solid-state 
batteries, become as critical as the electrochemical stability to keep steady charge transfer at these 
interfaces. This review will focus on stress and strain that result from normal and extended battery 
cycling and the associated mechanisms for stress relief, some of which lead to failure of these batteries. 


eveloping the next generation of solid- 

state batteries (SSBs) will require a par- 

adigm shift in the way we think about 

and engineer solutions to materials chal- 

lenges (7-4), including the way we con- 
ceptualize the operation of a battery and its 
interfaces (Fig. 1). Solid-state Li metal batteries 
that utilize a Li metal anode and a layered 
oxide or conversion cathode have the poten- 
tial to almost double the specific energy of 
today’s state-of-the-art Li-ion batteries, which 
use a liquid electrolyte. Storing and releasing 
this energy, however, comes with dimensional 
changes in the electrodes: lattice stretches and 
distortions in cathodes and metallic lithium 
deposition in anodes. Liquid electrolytes can 
instantly accommodate volume changes of 
the electrodes without stress buildup in the 
electrolyte or loss of contact with the cathode 
particles. When switching to an SSB, how- 
ever, these compositional strains, the stresses 
they cause, and how these stresses are relieved 
are integral to battery performance. The ma- 
jority of failures in SSBs are first and foremost a 
mechanical failure. Successful design of SSBs 
will be integrally linked to how effectively the 
materials manage the evolution of stress and 
strain in these batteries. 

Paramount to achieving the high energies in 
SSBs is the use of a Li metal anode. Histor- 
ically, Li metal anodes have been heralded as 
unsafe owing to the propensity to grow Li de- 
posits, which penetrate the cell, causing short- 
ing and subsequent thermal runaway (5). These 
deposits most often are termed dendrites in 
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literature, although sometimes they are called 
filaments. Neither term, in our opinion, is en- 
tirely accurate. The most promising solution to 
this issue of lithium growth was to use a solid- 
state electrolyte (SSE) in place of a liquid elec- 
trolyte, as it has the potential to mechanically 
suppress the penetration of Li dendrites. How- 
ever, practical experience with prototype solid- 
state Li metal batteries has shown that Li has 
an unusual propensity to penetrate and frac- 
ture even the strongest electrolyte materials 
(6-9). Key to solving both the challenges of the 
cathode-electrolyte interface and Li-electrolyte 
interfaces is a clear understanding of the me- 
chanics of all the materials involved across 
battery-relevant length scales, temperatures, 
and strain rates. We especially want to em- 
phasize the importance of considering the 
mechanical response to strain in materials as a 
function of length scale and strain rate. In an 
SSB cathode, particles are typically ~6 to 10 um 
in size with individual grains on the nano- 
meter scale, whereas nonuniform Li deposits 
can range from nanometers to millimeters in 
scale. This breadth of scale can result in vastly 
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different mechanical responses and failure 
mechanisms. Furthermore, next-generation 
batteries must be able to operate at rapid 
charging times and discharge times that range 
from extremely fast pulses to slow discharging 
over the course of several days. The different 
strains imposed by these various charging and 
discharging rates can substantially affect the 
evolution of stress and strain. 

Current anodes, SSEs, and cathodes are 
constructed from a diverse range of mate- 
rials. This article will not be a comprehensive 
overview (1, 3, 10) of SSB materials but rather 
will focus on providing a framework by which 
one can analyze and identify the most critical 
mechanical properties and gaps in the re- 
search and apply them to each material sys- 
tem in question. 


A mechanics perspective of a battery 
in operation 


During battery operation, redox reactions oc- 
cur simultaneously at the cathode-electrolyte 
and Li-electrolyte interfaces. On the cathode 
side (assuming a layered oxide cathode), Li* 
intercalates into the structure, creating a gra- 
dient in the lattice parameter and a concur- 
rent nonuniform deviatoric elastic strain and 
volume change (dilatation). The magnitude of 
the volume change, and whether the lattice 
expands or contracts, as well as whether the 
expansion or contraction is asymmetric or 
symmetric, depends on the specific chemical 
composition and structure of the cathode active 
material (7). In most cases, charging will re- 
sult in a decrease in the lattice parameters 
as Li is extracted from the structure and an 
increase in lattice size as the Li is reinserted 
into the structure. This results in second- 
ary cathode particles dilatation ranging from 
3.3% in LiNiy3Mny/3Co1/302 (NMC111) to 7.8% 
in LiNiggMngjCog;02 (NMC811) (72). Such ex- 
pansion or contraction induces stress at the 
cathode-SSE interface. If the resulting tensile 
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component of this interfacial stress exceeds 
the strength of the stressed material, then the 
predictable outcome is brittle fracture. This 
unwanted stress relief can occur within the 
cathode active particles, along the cathode- 
solid electrolyte interface, and/or in the solid 
electrolytes themselves. 

On the anode side, Li deposition is typically 
nonuniform, resulting in localized stresses at 
the interface that are unstable and promote 
roughening of the Li and the formation and 
growth of Li filaments. These filaments are 
capable of penetrating and/or fracturing the 
SSE, thus degrading performance or causing 
the cell to fail (7, 9, 13). An equally important 
observation is that the sustainable pressure 
within small, constrained volumes of metal- 
lic Li is heavily dependent on the strain rate, 
which is directly related to the operational 
current density (14, 15). This has implications 
for applications that use rapid charging rates 
and the need for occasional intense pulses. For 
electric vehicles, battery charging will need to 
occur at a prolific rate of 4C or greater than 
1um/min of Li plating. Furthermore, occasional 
pulsing could require rates as high as 20C or 
a Li stripping rate of >5 um/min, albeit for 
very short durations. 

As rapid volume changes occur, they will 
cause notable problems for battery operation. 
Therefore, it is imperative to (i) minimize or 
eliminate the sources of elastic strain, in par- 
ticular, localized gradients in elastic strain, 
and (ii) engineer materials with efficient stress 
relief mechanisms (for example, enable local- 
ized ductility) capable of operating over a wide 
range of battery-relevant length scales, tem- 
peratures, and strain rates without resorting 
to stress relief by fracture. Furthermore, as the 
electrochemical processes are reversed, it is 
important that concurrent changes in the 
state of stress not inhibit intimate contact 
between the electrodes and the electrolyte at 
the interfaces. In some instances, this require- 
ment may necessitate a form of reversible 
strain. 


Minimizing gradients in elastic strain 


One of the most promising ways to eliminate 
mechanical instabilities will require eliminat- 
ing the gradients in elastic strain that inevitably 
develop as a result of localized, inhomogeneous 
Li-ion transfer kinetics. Although this may 
seem like an almost impossible task, there is a 
considerable amount of research in SSBs that 
centers around this approach. Many of the im- 
provements in SSBs have arisen from efforts 
to minimize or eliminate elastic strain at both 
the cathode-electrolyte and anode-electrolyte 
interfaces. 


Zero effective strain cathodes 


In most traditional cathode materials, the cath- 
ode experiences appreciable volume change. 
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This is exacerbated by asymmetric changes 
in the lattice constants especially for layered 
oxide cathode structures. There is an area of 
cathode research that focuses on the devel- 
opment of zero-strain (ZS) cathodes that ad- 
dresses this challenge. In these materials, as 
Li is extracted and intercalated, the material 
experiences no change in the lattice parame- 
ters resulting in a net zero strain. The most 
common example of these materials is the 
spinel Li,Ti;0;2 (11). Though a key research 
material, its low voltage has limited its use in 
large-scale commercial applications. Work with 
the LiCoO, (LCO) spinel has demonstrated that 
asmall amount of Al doping can both stabilize 
the cathode and ensure that it too is ZS but 
with a higher capacity and higher voltage pro- 
file (16). Although low-Co cathodes are needed 
for widespread implementation in electric ve- 
hicles, owing to issues with Co availability, it 
should be possible to design Co-free cathodes 
with high voltage and capacity that are also ZS. 
The development of LiNi,Mn,Co,0. (NMC) 
cathodes that substantially minimize the vol- 
ume change as a function of state of charge 
has been demonstrated (17). Using computa- 
tional ab initio studies, the three variables— 
chemical composition, ionic ordering, and metal 
coordination—were established as critical for 
design of ZS cathode materials in (78), anda 
near-ZS disordered rocksalt Liy,3Vo.4Nbo302 
and Liyo5Vo.5sNbg.201,9Fo1 cathodes were syn- 
thesized and characterized. 

A second strategy is to pair cathode active 
materials with negative strain (LCO) and posi- 
tive strain (NMC) in such a way as to eliminate 
the overall volume expansion and contraction 
of the cathode as demonstrated in (77). Although 
this approach may help mitigate mechanical 
failure at the cathode-electrolyte interface, lo- 
cal strain and associated stress may still result in 
substantial degradation in cathode particles. 


Planar plating and stripping of Li metal 


Most research in which Li metal anodes are 
paired with either solid or liquid electrolytes 
shows that the Li deposition is nonuniform 
and results in the formation both of dead Li 
and of Li filaments that cause cracking and 
failure in solid electrolytes (13, 19-21). For 
optimal Li metal anode operation, the goal is 
to enable planar layer-by-layer, stress-free plating 
and stripping of Li metal. This is challenging, 
as the origins of the nonuniform deposition, 
especially from single-ion conducting solid 
electrolytes, are not well understood. Nonethe- 
less, some of the most impressive increases in 
Li metal battery performance have come from 
improvements in the homogeneity of Li plat- 
ing and stripping. By using an Ag-C interface 
layer, a homogeneous plating and stripping of 
Li metal can be enabled (22). This technique of 
using thin metal films that alloy with Li metal 
has been shown to improve the homogeneity 
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of Li plating and stripping in other situations 
as well (23, 24). 

Another method to improve the plating and 
stripping homogeneity has been to improve the 
wetting of Li metal to solid electrolytes (25). 
This has been evident in the research efforts to 
improve the critical current density of garnet 
lithium lanthanum zirconium oxide (LLZO) 
solid electrolytes. Two approaches have been 
shown to substantially improve the wetting 
of Li metal to LLZO. The first has centered 
around removing interfacial defects, including 
Li carbonate, at the surface. The removal of 
these impurities improves the wetting of Li 
metal to LLZO and boosts the critical current 
densities (25). A second method has been to 
apply thin coatings such as Al,O3 onto the sur- 
face of the LLZO by atomic-layer deposition or 
other methods (24). Although this approach 
has improved room-temperature critical cur- 
rents beyond 1 to 2 mA/cm”, high rates still 
result in cracking and ultimate failure. This 
has been linked to void formation at the Li 
interface upon stripping, resulting in a non- 
planar Li morphology (26-28). 

Another factor contributing to nonuniform 
plating and stripping of Li metal is the con- 
dition of the Li itself. The Balsara group used 
x-ray tomography to look closely at the purity 
of Li metal films (29-31). They found that most 
commercial Li metal foils have a huge number 
of defects scattered throughout the Li metal. 
When these impure Li metal foils are used in 
conjunction with polymer electrolytes, strip- 
ping causes the defects to agglomerate at the 
interface. These agglomerates themselves cause 
mechanical penetration through the separator 
and cause the Li to plate nonuniformly. 


Stress-relief mechanisms operating in SSBs 


Because local stresses will inevitably occur as a 
result of Li transport and deposition, it is crit- 
ical to consider possible stress-relief mecha- 
nisms in both the Li metal and the SSE. The 
goal is to activate inelastic or viscoelastic strain 
to reduce the stress magnitude. Mechanisms of 
such activation are different in different classes 
of solid electrolytes and in metallic lithium. 
Whether or not a solid-state electrolyte can 
manage the stresses induced by the redox 
reaction-imposed strain will depend on the 
availability of operational stress-relief mech- 
anisms at the applied current densities (strain 
rates) and service temperatures. When the in- 
elastic flow cannot be activated at a specific 
length and time scale, the stress relief pro- 
ceeds through fracture. 


Plastic deformation in Li metal 


Like most metals at room temperature, the 
mechanism of action that enables plastic de- 
formation (stress relief) in Li metal is a shear- 
driven dislocation glide. Unlike most metals 
at room temperature, however, Li is also at a 
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sizable fraction of its absolute melting tem- 
perature (described as a high homologous tem- 
perature, 7;;). In general, a high 7;; promotes 
thermally activated glide. This is why when 
handled in bulk, Li metal is soft and mallea- 
ble. High homologous temperatures also mean 
that Li metal is continuously annealing. As 
demonstrated in (4), the net effect is that bulk 
Li has no notable capacity for work hardening 
at room temperature unless the strain rate is 
sufficiently high. High homologous temper- 
atures also enable other thermally activated 
stress-relief mechanisms (creep) to operate in 
parallel, albeit with widely varying degrees of 
operational efficiency. Within this framework, 
it has been experimentally shown (4, 32, 33) 
that over a wide range of battery-relevant op- 
erating conditions, plastic deformation in bulk 
Li metal generally limits the flow stress to of ~ 
1 MPa or less. However, recent nanoinden- 
tation, micro-, and nanopillar investigations 
(34-36) have demonstrated that within small, 
constrained volumes of Li metal, the opera- 
tional efficiency of dislocation glide can become 
severely limited even at high 7j;. Depending 
on the mechanism of action enabling stress 
relief, pure metallic Li can support more than 
200 MPa (Fig. 2). For example, it has been 
shown that the yield strength of lithium whisk- 
ers subjected to uniaxial compression increases 
from 12 to 244 MPa when the diameter de- 
creases from 607 to 76 nm (36). One potential 
explanation for this observation is that the 
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mechanism of action is dislocation nucleation- 
governed plasticity. The dimensions of these 
submicrometer whiskers are highly relevant 
because they correlate well with the observed 
thickness of lithium filaments in single-crystal 
LLZTO, ie, ~300 nm (8). The size effect (“smaller 
is stronger”) has been well documented for 
metals with cubic structure not only in com- 
pression but also in bending (37), tension (38), 
and torsion (39). Across this spectrum in the 
state of stress, at small length scales there is 
a lower probability of finding the disloca- 
tion density required to enable efficient flow 
(stresses < ~1 MPa) within the stressed volume. 
At indentation depths less than ~300 nm, pure 
Li metal was also found to support stresses 
nearly as high as 200 MPa. Herbert et al. (14) 
rationalized this observation by proposing 
that the dominant mechanism of action in 
Li reverts from dislocation-mediated flow to 
diffusional flow, the efficiency of which strongly 
depends on the strain rate and length scale— 
the two parameters directly related to the local 
current density and defect or pore size in the 
solid electrolyte. At the critical length scale, 
termed the defect “danger zone,” however, 
the local stress can be maximized because the 
stressed volume is too large for efficient dif- 
fusional flow (diffusion length is too long) and 
yet too small to find the required number of 
dislocations to trigger inelastic strain by dis- 
location motion. The important outcome is 
that the stress within Li-filled interface defects 
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Fig. 2. Length-scale and rate-dependent mechanics of lithium metal. Li bulk results (left) are tensile 
tests of Li ribbons from (4). Li microscale results (right) include compression of Li micropillars (35) 
and nanowhiskers (36). Orders-of-magnitude increase in yield stress can be observed with decrease of length 


scale from millimeters to nanometers. 
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in the danger zone can be orders of magnitude 
higher than the flow stress of bulk polycrystal- 
line Li metal (/4, 40). 

There have also been many computational 
and theoretical investigations targeting the me- 
chanics of lithium at the micro- and nanoscale. 
Building on the work of Narayan (41), who 
uses large deformation theory to model the 
shear flow of lithium, the incorporation of 
Nabarro-Herring (volume diffusion), Harper- 
Dorn (nonconservative dislocation motion), 
and interface diffusional flow would enable 
much needed insight into understanding in- 
terface stability at relevant length scales. This 
analysis would also facilitate important ex- 
tensions of the models presented by Barai and 
Srinivasan (42, 43) and Verma and Mukherjee 
(44) and their colleagues, and potentially con- 
tribute to our understanding of the role of 
external pressure (stack pressure). This can be 
achieved by quantifying interface friction and 
examining the radial stress distribution and 
its dependence on the radius of contact be- 
tween the Li and SSE relative to the thickness 
of the Li anode. 


Fracture 


A failure of the Li metal to relieve stress, or 
localized gradients in elastic strain at the 
cathode-electrolyte interface, can require stress 
relief from the SSE. In general, crack tips in 
crystalline materials are the sources of dis- 
locations that support plastic deformation to 
reduce stress. However, in ionic crystals (i.e., 
cathode materials and ceramic electrolytes), 
dislocation nucleation must overcome a large 
energy penalty, and therefore ionic compounds 
are inherently brittle even at very small length 
scales despite their much higher cohesive en- 
ergy relative to metals. Though related to duc- 
tility, resistance to fracture is quantified by a 
fracture toughness (Kc), which is a function of 
the critical stress required to propagate an ex- 
isting crack. For multiple ionic and covalent 
materials, fracture toughness appears to cor- 
relate well with the Pugh’s ratio and volume 
per atom (45). However, even for relatively tough 
ionic crystals (e.g., cubic spinel MgAl.0,), the 
fracture toughness is capped at ~2 MPa m¥?, 
which indicates that such materials will read- 
ily crack given sufficient pressure inside the 
initial defects. 

The same conclusion about limited fracture 
toughness can be deduced from the available 
data on inorganic ceramic electrolytes (46-53): 
The fracture toughness of these electrolytes is 
limited to values generally below 2 MPa m??, 
In separate studies, nanoindentation along 
grain boundaries of several lithium lanthanum 
zirconate (LLZO) materials also shows little 
difference between the interior of a grain and 
the volume near a grain boundary (57). All 
methods for well-densified LLZO samples 
yielded comparable fracture toughness that 
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indicates low resistance to cracking in ion- 
conducting garnets, whether in polycrystalline 
or in single-crystal form. Sulfide electrolytes 
demonstrate even lower resistance to fracture 
in the range of 0.2 MPa m” (54). Dense halide 
materials would also be expected to exhibit 
similar fracture behavior. 

Literature reports on micro- and nanome- 
chanical behavior of cathode materials are 
scarce, but the available data indicate a similar 
tendency to fracture. The single-crystal fracture 
toughness of LiNip;Mno3C09..0. was reported 
as 0.3 MPa m” and that of polycrystalline ag- 
glomerate particles was measured as 0.1 MPa 
m” (55). The fracture toughness of sintered 
LiCoO, pellets (56) varied widely between 0.2 
and 6.5 MPa m””, although two-thirds of nano- 
indentations produced fracture toughness 
below 1.1 MPa m”?. Fracture toughness of 
LiMn,O,, determined by micropillar splitting, 
was measured as 0.24 MPa m”? (57). Overall, 
cathode materials are brittle with Ko ~ 1 MPa 
m”. This underscores the need for the devel- 
opment of materials that do not produce hard 
contacts and that can relieve the stress build- 
up by means other than fracturing. 


Plastic deformation in ceramics 


The major sources of stress in an SSB include (i) 
Li plating into defects in the solid electrolyte, 
i) stress due to expansion of cathode particles 
constrained by the solid electrolyte, and (iii) 
externally applied stress to the battery as a 
typical engineering control. A combination 
of battery materials that can reversibly deform 
in a SSB and limit the stress without creating 
fractures is the goal for SSB engineering. While 
limiting the stress buildup by either diffu- 
sional flow or via dislocation glide is a suitable 
mechanism for metallic Li, ceramic electrolytes 
do not activate slip systems at room temper- 
ature but instead fracture. One approach that 
may need further investigation is intentional 
introduction of crystallographic defects into the 
material at the processing stage. In this case, 
the toughening of the material is achieved not 
by the generation of dislocations but by moving 
existing dislocations. The key therefore is to 
intentionally introduce high dislocation den- 
sities into the material so that there is a prob- 
ability of finding enough dislocations in a small 
volume around a crack tip (Fig. 3). There are 
processing methods that can potentially achieve 
this goal, such as flash sintering and aerosol 
deposition. This has been demonstrated for 
TiO, by flash sintering, which introduces stack- 
ing faults and dislocations (58). The material 
was able to deform up to 10% in compression 
without fracture, and the slip bands with lo- 
calized strain were observed—just as in defor- 
mation of metallic materials. Recent research 
(59) demonstrated the possibility of intro- 
ducing dislocations and improving ductility 
in LLZO. 
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Amorphous materials will relax toward a 
crystalline structure, whereas noncrystalline 
materials (i.e., glasses) relax toward a liquid 
structure (60). Typically, glasses are cast from 
a melt, and the supercooled liquid provides a 
network accommodating the mobile Li* ions. 
Amorphous materials are formed by sputter 
deposition, condensation from a plasma torch, 
high-energy milling, and other ultrarapid quench 
processes. The absence of a framework of a 
bonded glass former allows for a higher Li 
content. The major composition families are 
oxides, sulfides, and a variety of mixed com- 
positions. At macroscale, these materials are 
brittle, yet locally densification and shear flow 
may allow stress reduction without fracture 
(Fig. 3). This is believed to be critical to the Li 
interface behavior. These processes manifest 
differently depending on the composition and 
mechanical properties, of which modulus to 
hardness ratio, H/H, and Poisson’s ratio seem to 
have the most influence. 

One example of an amorphous solid elec- 
trolyte that exhibits a high resistance to frac- 
ture is lithium phosphorus oxynitride (Lipon) 
(61). Cells constructed with this amorphous 
thin-film solid electrolyte have been successfully 
cycled over 10,000 times with 95% capacity re- 
tention and no Li penetration (62). In addition, 
current densities as high as 10 mA/cm? have 


been demonstrated (62). Studies of the mechan- 
ics of amorphous Lipon are limited (63, 64) but 
show a robust material when prepared as a 
thin film. The modulus and hardness measured 
by nanoindentation gives an E/H ratio of ~23. 
This value is higher than that of typical oxide 
glasses (E/H = 10 to 13), indicating some de- 
gree of ductility in Lipon. This ductile behavior 
was further revealed in (64), which showed 
that Lipon can densify and deform in shear to 
reduce the stress intensity. 


Time-dependent mechanical behavior 


Stress relief in glasses and amorphous mate- 
rials in general can be achieved by inelastic 
deformation partitioned into isochoric shear 
and densification (Fig. 3). Such partitioning has 
long been a rationale in classifying glasses as 
“normal” and “anomalous” (65, 66). It is known 
that stress-induced densification in glass can | 
be partially recoverable, unlike permanent plas- 
tic shear flow. This recovery is achieved by 
annealing at temperatures below the glass 
transition temperature (67, 68). Silica glass is a 
good example of a material in which inelastic 
deformation proceeds exclusively through den- 
sification, ie., an anomalous glass. The anom- 
alous behavior is further demonstrated in the 
increase in brittleness with decrease in density 
(66). Estimates of the plastic (densification) 
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Fig. 3. Avoiding fracture by triggering plasticity through densification and shear flow in amorphous 
materials and toughening by introducing dislocations in crystalline ceramics. 
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deformation energy in silica glass give values 
very close to the activation energy for the re- 
covery in this material (~50 kJ/mol) (69). 

A glass below its glass transition temper- 
ature has the status of a “frozen liquid” as its 
ability to flow is constrained. The Maxwell 
relaxation time for such a material would ap- 
proach ~10"° s. However, under high enough 
stresses, such as those achievable by nanoin- 
dentation, viscoelastic deformation and creep 
were reported in sulfide glassy ionic conduc- 
tors at room temperature (70). Nanoinden- 
tation of Li,S-P.S; (LPS) demonstrated both 
creep and relaxation within one load-unload 
cycle. The latter was attributed to the reversal 
of viscoelastic strain (70). 

Research on deformation behavior and frac- 
ture of ion-conducting amorphous materials 
and glasses is rather limited. However, partial 
recovery similar to that in LPS glass has been 
observed at room temperature in Lipon (63, 64). 
It has been suggested, on the basis of molec- 
ular dynamics (MD) simulations, that densi- 
fication in Lipon occurs through changes in 
P-O-P bond angles (64). This structural change 
could be behind the reversible viscoelastic 
strain. Simulating restoration of densification, 
however, is unfeasible owing to unattainable 
MD methods time scales. An ability to recover 
the densified volume at least partially, without 
requiring external energy input, warrants fur- 
ther investigation. Under cyclic loading, such 
partial recovery creates a hysteresis-like cyclic 
behavior (Fig. 4). 


Electrochemical fatigue 


Although fracture has been discussed in the 
context of stress relief, the origins of fracture 
are often much more complex. In traditional 
structural materials, cyclic stresses and strains 
can cause damage accumulation that eventu- 
ally can result in failure due to fracture. Active 
electrode materials respond to cyclic electro- 
chemical loading caused by repeated insertion 
and removal of lithium from the host struc- 
ture, in a way that is similar to the structural 
response to the cyclic application of external 
mechanical force. For cathodes, the resulting 
changes lead to the irreversible damage ac- 
cumulation on two different length and time 
scales and are driven by different mechanisms: 
(i) intergranular fracture in polycrystalline 
cathode particles, and (ii) lithiation-induced 
dislocation dynamics and transgranular frac- 
ture in single cathode particles. Whereas the 
first type of damage has been well documented 
(71) in liquid electrolyte cells, the dislocation- 
driven damage is less well understood and 
seems to be unavoidable in layered oxide ma- 
terials (72, 73). 

Cyclic electrochemical straining of electrode 
particles results in dimensional changes that 
are sufficient to propagate interfacial cracks 
between solid electrolytes and cathode active 
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materials. Additional cracks can be created 
within the solid electrolyte as either an exten- 
sion of the interfacial cracks or as new fracture 
surfaces as ways to reduce large and complex 
stresses in SSBs (Fig. 5). Available experimen- 
tal evidence suggests that the majority of such 
interfacial fracturing occurs within the first 
cycle and is responsible for the initial capacity 
loss (74). Evolution of such cracks, however, 
can be a cyclic process, reminiscent of fatigue 
crack growth; at present, there is insufficient 
experimental information to support or reject 
this hypothesis with confidence. Continuous 
capacity loss with cycling in NMC811/B-Li3PS, 
cathodes suggests gradual accumulation of 
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damage associated with cracking (74). The cur- 
rent practice is to negate the effects of crack- 
ing in composite cathodes by applying external 
pressure to the cell. This, however, creates a 
complex stress state inside the cathode with 
the possibility of local regions with high shear, 
which can induce further cracking (75). 


Adhesion and friction 


The adhesion or wetting of Li metal to the 
solid electrolyte, as well as the adhesion of the 
cathode active material to the solid electrolyte, 
are key for robust cyclic performance of an 
SSB. Progress has been made in better under- 
standing of how to promote the wettability of 
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Fig. 4. Deformation recovery in Lipon, leading to hysteresis-like behavior upon nanoindentation with 
cyclic loading. Lipon, lithium phosphorus oxynitride (64). 
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metallic Li to LLZO. The utility of an exter- 
nally applied compressive stack pressure has 
been demonstrated as well. Adhesion and fric- 
tion, two of the key variables directly related 
to maintaining coherent interfaces, are among 
the most poorly understood. The knowledge 
gap is primarily due to the experimental chal- 
lenges associated with measuring adhesion 
and quantifying the effects of friction. Tech- 
niques such as, but not limited to, bulge test- 
ing, scratch testing, and nanoindentation have 
been used extensively by the microelectronics 
and thin-film industries to measure adhesion. 
Despite these efforts, quantitative measure- 
ments of adhesion remain elusive, and crude 
methods such as the scotch tape pull test are 
still widely practiced today. In transitioning 
to SSBs, accurate measurements of adhesion 
will be further complicated by length-scale ef- 
fects and battery-relevant operating conditions 
(temperature and strain rate). Friction effects 
become important to maintaining coherent 
interfaces in the context of stack pressure. For 
example, at the Li-SSE interface, the outward 
radial flow of Li is arrested by a nonuniform 
interface shear stress that depends on the co- 
efficient of friction, the stack pressure, the 
relative thickness of the Li (which changes with 
cycling), and the friction conditions (76). 


Lithium propagation in solid electrolytes 


On the basis of our current understanding of 
solid electrolyte failure, crack formation plays 
an important role in Li propagation through 
the ceramic electrolyte separator. Propagation 
of metallic lithium has been documented along 
the grain boundaries (7), as well as through the 
single grain [single-crystal LigLa,ZrTaO,5 
garnet (9)]. The velocity of such penetration 
can be pronounced: In one experiment, a short 
circuit occurred within 37 s of 10 mA/cm? cur- 
rent flow through a 2-mm-thick LLZO single 
crystal (9). 

Intergranular penetration of alkali metal is 
a typical problem for polycrystalline ceram- 
ics with observations dating back to develop- 
ment of sodium beta-alumina solid electrolytes 
(77, 78). In early models, propagation of sodium 
was described as a flow in a small channel with 
a regular simple cross section and driven by 
the Poiseuille pressure (78). A similar approach 
is taken in (79), in which the Li stress relief is 
described using a Poiseuille model. Physically, 
the Li-filled interface defect is modeled as a 
long, skinny needle. Under these conditions, 
the state of stress near the apex of the Li-filled 
needle is hydrostatic; thus, dislocation glide 
is inoperable. The diffusion length in this de- 
fect geometry is far too long to for efficient 
transport of lithium. The net effect is that the 
needle-like defect geometry completely cir- 
cumvents the Li stress-relief mechanisms. Thus, 
the pressure climbs until it is limited by either 
the critical stress required to ex nihilo create 
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dislocations, fracture of the SSE, or, as per 
Barroso-Luque et al. (79), the stress saturates 
because of a change in the local potential such 
that Li* no longer plates to the defect interface. 

Once the pressure creates an initial crack in 
the SSE, it can propagate either along grain 
boundaries or transgranularly until the local 
stress intensity reduces below the threshold 
for crack propagation. Initial, mostly postmor- 
tem, studies showed predominantly intergran- 
ular lithium penetration in garnet electrolytes 
(7), whereas more recent in situ studies dem- 
onstrated that the cracking could proceed both 
along the grain boundaries as well as through 
the grains (80). These latter observations can be 
rationalized by the fact that in LLZO garnets, 
the values for single-crystal and polycrystalline 
fracture toughness are very close (49-51). In 
some instances, the single-crystal fracture tough- 
ness was measured to be as low as 0.6 MPa m!” 
(9) in LigLaz3ZrTaOjs, explaining predominantly 
transgranular crack propagation observed in 
(80). Because crack propagation in brittle ma- 
terials occurs much faster than the subsequent 
Li plating, fractures penetrating through the 
entire electrolyte thickness yet devoid of lithium 
have been documented (87). 

Most of the theoretical treatments of Li- 
induced failure treat lithium filaments as prop- 
agating from the metal-electrolyte interface 
toward the bulk of the electrolyte (8, 78) (mode 
I degradation). However, the reduction of 
lithium and subsequent formation of lithium 
deposits easily happen within the electrolyte 
(80, 82, 83), away from the interface with lith- 
ium (mode II degradation). Such initial depo- 
sition can be driven by the reduced bandgap 
at the grain boundaries of the electrolyte (84). 
A lower bandgap indicates increased electronic 
conductivity and thus the possibility for lith- 
ium cation reduction within the electrolyte 
rather than at the electrode. This is a fundamen- 
tally important observation that distinguishes 
solid electrolytes from liquid electrolytes, in 
which lithium plating can occur only at the 
electrode surface. Building on the transmis- 
sion electron microscopy observations of lith- 
ium filling of the void at the triple junction 
connecting the grain boundaries (84), it seems 
reasonable to suggest that the mechanics 
analysis based on the dominant stress-relief 
mechanism is still applicable and in the absence 
of flow in lithium, stress relief can proceed by 
cracking of the electrolyte. Bowl-shaped spalled 
fragments due to Li-induced cracking have been 
observed and characterized (80, 87). Finally, one 
can envision a situation in which the lithium 
plates uniformly along the grain boundaries in 
a polycrystalline ceramic electrolyte, thus tra- 
versing the electrolyte without the necessity for 
crack propagation. This could happen at very 
high leakage currents when high current den- 
sities are applied within the cell. The three 
scenarios are schematically shown in Fig. 6. 
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Recent studies examining the relationship 
between the critical current density (CCD) and 
temperature further highlight the connection 
between Li’s mechanical behavior and cell 
performance (85). Beyond identifying a similar 
trend in Li’s self-diffusion coefficient, ration- 
alization of the temperature-dependent CCD 
in solid Li remains strictly limited to the me- 
chanical properties of bulk Li. At 195°C, the 
CCD of molten Li is found to increase by an or- 
der of magnitude. The improvement is notable 
but not surprising, as the self-limiting behavior 
of molten Li at any length scale will not sup- 
port the stress required to initiate fracture of 
the electrolyte. 


Conclusions 


Recent research has provided insight into the 
origins of strain and the mechanisms of stress 
relief within each component of an SSB. Per- 
haps one of the most vital lessons is that at 
small length scales, Li is more than 100 times 
stronger than in bulk and cannot relax stresses 
that build up at the interface during Li plating. 
This necessitates stress relief through the solid 
electrolyte and typically leads to failure. Cell fail- 
ure through electrolyte fracture by propagat- ‘ 
ing lithium is the most critical and frequently 


oy) 
Mode | 
Surface Defect 


| Sanaa Mode II : 
Volume Defect 
Oy 


Fig. 6. Schematic of lithium propagation through 
solid electrolyte. Mode | indicates propagation 
from the surface defect and in general does not 
require crystallinity. Mode II refers to scenario 

of Li plating within the electrolyte voids with 
subsequent trans- or intergranular crack growth. 
The third case is uniform plating of lithium along 
the grain boundaries. 
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studied type of failure leading to short cir- 
cuit. Less pronounced than a sudden short, but 
still highly deleterious, is the reduction in cell 
capacity under charge-discharge cycling, which 
is related to formation of cracks at the cathode/ 
solid electrolyte interface. Both failure modes 
are directly related to the length scale- and rate- 
dependent mechanics of lithium, solid elec- 
trolyte, and cathode active material and their 
ability to dissipate strain energy without frac- 
ture. Although much progress has been made 
in understanding the stress relief of these crit- 
ical materials, there are still major gaps in our 
understanding. 


We have presented a review of SSB mechan- 


ics and set a general framework in which to 
conceptualize and design mechanically robust 
SSBs, namely (i) identifying and understand- 
ing the sources of localized strain; (ii) under- 
standing the stresses generated by this strain 
in particular at the battery interfaces and how 
the battery materials respond to such stresses; 
and (iii) designing battery materials and battery 
cells with the needed stress and strain evolu- 
tion. Using this framework, we reviewed the 
various battery materials typically used in the 
SSB community in terms of their mechanics. 


Our goal with this work is to enable research- 


ers in the SSB and mechanics communities 
to understand many of the underlying sources 
of SSB failure and design solutions to these 
problems, including: (i) stress relief mecha- 
nisms in Li metal as a function of length scale, 
temperature, and strain rate (current density); 
(ii) stress relief mechanisms in ceramics, glasses, 
and amorphous ceramics as a function of length 
scale, temperature, and strain rate; (iii) engi- 
neering ductility into ceramic and/or glassy elec- 
trolytes; (iv) designing Li metal anodes that can 
either eliminate inhomogeneous plating and 
stripping of Li metal or that can relieve stresses 
at the Li-electrolyte interface; (v) engineering 
cathode active materials that either exhibit zero 
strain on cycling, are resistant to fracture, or 
have some measure of ductility; (vi) designing 
composite cathodes to minimize strain and 
maximize stress relief; and (vii) detailed mod- 
eling to help describe the evolution of stress 
and strain in SSBs that includes length-scale 
effects, friction, adhesion, and creep. 
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INTRODUCTION: Gene expression is regulated 
by transcription factor (TF) proteins that bind 
DNA-regulatory elements in the genome. De- 
spite decades of research cataloging TF “motifs,” 
these do not fully explain observed genomic 
binding in cells. Many TFs bind regions lack- 
ing motifs, whereas other regions with appar- 
ently strong motifs remain unoccupied, and 
emerging evidence suggests that the DNA se- 
quence context surrounding motifs can strong- 
ly affect binding (see the figure, panel A). Short 
tandem repeats (STRs, consecutively repeated 
units of one to six nucleotides) provide a good 
example of these sequence contexts. STRs com- 
prise ~5% of the human genome (compared 
with 1.5% for all protein-coding genes) and are 
enriched in enhancers. Variations in STR length 
have been associated with changes in gene ex- 


A B 
> Enhancer Bf Motif 


Bf Short tandem repeats (STRs) 
+ TF occupancy 


pression and implicated in several complex 
phenotypes, such as schizophrenia, cancer, au- 
tism, and Crohn’s disease. However, the mech- 
anism by which STRs affect transcription 
remains unknown. 


RATIONALE: One mechanism by which STRs 
could affect gene expression is by altering the 
affinity and/or kinetics of TF binding to reg- 
ulatory DNA (see the figure, panel A). To inves- 
tigate this, we used various high-throughput 
microfluidic binding assays (i.e., MITOMI, 
k-MITOMI, and STAMMP) and bioinformatic 
analyses to systematically quantify the impacts 
of different sequence contexts on TF binding. 
We measured affinities (Kas) and kinetics (og) 
for two basic helix-loop-helix TFs that bind a 
CACGTG E-box motif (Pho4 from Saccharomyces 


Transcription 
factor (TF) 


1a 


aK. 


Not preferred site 


D — Preferred site 


ATCGATCGATCGATCGATCGATCGATCGATC 


MAX 
STRs: 
AAAAAAAA 
CCCCCCCC 
ATATATAT 
i Mot if ACACACAC 
AGAGAGAG 
Random flanks ATCCATCC 
Ga STR flanks ena 
E 
300 | STR binding: 
@ Significant 
a ® Not significant 
— 
« 200 } 
° 
5 
2 
Sj 
= 100 | 
Veh SS 
SS SQ Sy & 
KE eo PENS es 
AL Ow ees 
eS Co 
x ~ 


STRs directly bind TFs to alter gene expression. (A) Schematic of enhancers, motifs, and STRs. 

(B) Schematic of TFs and DNA libraries tested in this study. (©) Favorable STRs alter energetic landscapes 
by directly binding TF DNA-binding domains. (D) Favorable STRs maximize potential preferred sites and 
contribute additively to binding energies. (E) STR binding by TFs is widespread. 
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cerevisiae and MAX from Homo sapiens 
DNA sequences with or without an E-box 

surrounded by random sequence or multiple 
different types of STRs (see the figure, panel B). 


RESULTS: Measured binding constants (Kgs) 
for 609 distinct TF-DNA combinations re- 
vealed that different STRs can alter binding 
affinities by >70-fold (see the figure, panel C), 
approaching or exceeding effects from mutat- 
ing the consensus motif. Preferred STRs dif- 
fered for Pho4 and MAX TFs, demonstrating 
that motifs are not sufficient to predict pre- 
ferred STRs. Gel-shift assays and additional 
experiments using TF truncation constructs 
established that TFs directly bind STRs (see 
the figure, panel C) through their DNA-binding 
domains in the presence or absence of motifs. 
Although not predicted by standard mono- 
nucleotide models, the observed STR binding | 
is well explained by a simple partition func- 
tion model from statistical mechanics in which 
multiple repeated weak binding sites contrib- 
ute additively to binding affinity (see the fig- 
ure, panel D). Measured apparent dissociation 
rates (KogS) for 106 TF-DNA combinations and 
kinetic modeling suggested that STRs primar- ‘ 
ily alter macroscopic apparent association rates 
and increase the local density of DNA-bound 
TFs. Finally, neural networks trained only on 
in vivo genome-wide chromatin immunopre- ‘ 
cipitation data predict effects identical to those 
measured in vitro, suggesting that STR pref- 
erences play a substantial role in properly lo- 
calizing TFs in cells. 


CONCLUSION: Analysis of previously published 
protein-binding microarray and SELEX data 
suggests that ~90% of eukaryotic TFs prefer- 
entially bind at least one type of STR (see the 
figure, panel E). Because STRs are highly mu- < 
table, we propose that they should be consid- ‘ 
ered an easily evolvable class of cis-regulatory 
elements. Preferred STRs need not resemble 
known motifs, suggesting a mechanism by - 
which TF paralogs can be recruited to different 
regulatory regions and regulate distinct target 
genes. Although STRs maximize the number 
of potential weak binding sites, we anticipate 
that nonrepetitive sequence contexts contain- 
ing many low-affinity binding sites should 
similarly increase binding. Thus, we propose 
that STRs function as “rheostats” to tune local 
TF concentration and binding responses to 
regulate gene expression in disease, develop- 
ment, and homeostasis. 
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Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene 
expression, yet how they regulate transcription remains unknown. We found that STRs modulate 
transcription factor (TF)—-DNA affinities and apparent on-rates by about 70-fold by directly 
binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif 
mutations. STRs maximize the number of weakly preferred microstates near target sites, 

thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming 
that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies 
predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially 
bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target 


TFs to genomic sites. 


hort tandem repeats (STRs), consisting of 

1- to 6-base pair (bp) units repeated con- 

secutively, comprise 5% of the human 

genome (figs. S1 to S3), compared with 

1.5% for protein-coding genes (J, 2), with 
a median STR length of 29 bp (fig. S4). STRs 
are enriched in cis-regulatory elements across 
eukaryotic genomes (3), including in humans 
[~25% of enhancers contain an STR (4, 5); 
fig. S5], and can activate or repress transcrip- 
tion in Homo sapiens (5-21), Mus musculus 
(22, 23), Saccharomyces cerevisiae (3), Dro- 
sophila melanogaster (24, 25), and others (26). 
Dinucleotide STRs are associated with broad 
activity of cis-regulatory elements across cell 
types in D. melanogaster (27), and variation in 
STRs has been proposed to account for “miss- 
ing heritability” in genome-wide association 
studies (5, 28). Finally, population-level ge- 
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nomic studies have linked noncoding STR 
polymorphisms to autism (29, 30), schizophre- 
nia (31), height (37), and Crohn’s disease (5). 
Despite their widespread prevalence and doc- 
umented effects on gene expression, the physical 
mechanism by which STRs affect transcription 
remains unclear. STRs have been proposed to 
modulate transcription by changing the intrinsic 
affinity of histone proteins for DNA, thereby 
changing nucleosome occupancy (3, 22, 24, 32). 
However, STRs have not been shown to di- 
rectly alter chromatin accessibility other than 
the example of nucleosome-disfavoring poly(A) 
tracts (33). Alternatively, polymorphisms in 
STR length could alter distances between 
multiple motifs or between motifs and core 
promoter elements, disrupting regulatory gram- 
mar (34-36). However, genome-wide studies 
suggest that the syntax of cooperative tran- 
scription factor (TF) interactions at enhancers 
is unlikely to be perturbed by changes in motif 
spacing (37-39). Theoretical work has sug- 
gested that “sequence symmetries” (i.e., repet- 
itiveness) alone contribute to nonspecific TF 
binding, with maximum effects for homopoly- 
mer sequences (40, 41), and in vitro binding 
measurements and bioinformatic analyses 
have suggested that STRs affect TF-DNA bind- 
ing in the absence of specific base pair recog- 
nition (40, 42-46). Nevertheless, prevailing 
models of TF specificity do not predict ob- 
served specific binding to STRs (42), and the 
magnitudes of their energetic effects and 
potential impacts on binding kinetics remain 
unexplored. Here, we used multiple high- 
throughput microfluidic binding assays [MITOMI 
(47, 48), k-MITOMI (49), and STAMMP (50)] to 
systematically investigate how STRs influence 
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equilibrium binding and kinetics for two dif- 
ferent basic helix-loop-helix (bHLH) TFs. 


Results 
Quantitative measurements establish STRs alter 
TF binding affinities 
The bHLH TFs Pho4 [a yeast TF involved in 
the phosphate starvation response (5/, 52)] and 
MAX [a human TF involved in cell prolifera- 
tion, differentiation, and apoptosis (53, 54)] 
each bind an E-box regulatory element (Fig. 1A 
and table S1). To test the impact of STRs on 
binding, we quantified the binding of each 
TF to 17 DNA sequences containing either a 
moderate-affinity extended E-box sequence 
(GTCACGTGAC) or a random sequence (“no 
motif”) flanked by 13 bp of either random se- 
quence or STRs previously shown to enhance 
binding (42) (Library 1; Fig. 1B and table S2) 
through MITOMI microfluidic binding assays 
(Fig. 1C, figs. S6 to S9, and table S3). Measured 
binding for each DNA sequence over multiple 
concentrations can be combined with calibra- 
tion curves (figs. S10 and S11 and table S4) to 
extract the dissociation constant (Kg) by quan- 
tifying concentration-dependent TF binding 
and globally fitting Langmuir isotherms (Fig. 
1C; see the materials and methods). 
Measured Library 1 AAGs spanned ~2.6 and 
3.1 kcal/mol with a mean root mean squared 
error (RMSE) between replicates of ~0.53 and 
0.31 kcal/mol for Pho4 and MAX, respectively 
[Fig. 1D, figs. S12 to S21; see additional data at 
(55)]. DNA sequences with a motif surrounded 
by STRs were consistently bound 0.23 to 
0.90 kcal/mol tighter than those with a motif 
surrounded by random sequences, correspond- 
ing to an ~1.5- to 4.6-fold change in predicted 
affinity (Fig. 1, D and E). Distributions of mea- 
sured AAGs for sequences containing STRs 
were statistically significantly different from 
those with random sequences (as assessed by 
bootstrap hypothesis testing with a Bonferroni- 
corrected significance threshold; fig. S22 and 


table S5), and these effects scaled with STR - 


length (fig. S23). Measured AAGs did not change 
with ~5-fold differences in protein concentra- 
tion, confirming that DNA was in vast excess 
of available protein (figs. S24 and S25). Mea- 
sured AAGs were also consistent when using 
either wheat germ extract or Tris-buffered 
saline (TBS) as a binding buffer (fig. S26), and 
negative control experiments assessing bind- 
ing to enhanced green fluorescent protein 
(eGFP) alone showed no variability above 
the background RMSE (maximum deviation 
of +0.5 kcal/mol; fig. S27). Linear mono- 
nucleotide specificity models such as the 
position-specific affinity matrix (PSAM) pre- 
dicted a <0.1 kcal/mol effect for all flanking 
sequences but one (“Motif + GT/AC repeat 
2”) (fig. S28), establishing that the measured 
effects are not due to cryptic consensus sites 
in flanking sequences. 
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Fig. 1. Repetitive flanking sequences alter TF-DNA binding affinities in a 
sequence-specific manner. (A) Crystal structures and PSAM logos (47) for Pho4 
[Protein Data Bank (PDB) ID: laOa] and MAX (PDB ID: lhlo). (B) Library 1: 17 

DNA sequences with either an extended (10-bp) E-box motif (dark gray) or random 
(light gray) sequence surrounded by 13 bp on either side of repetitive (red) or 
random (light gray) sequence. (C) MITOMI microfluidic device (left) and zoomed-in 
view of three chambers (top right) showing solubilized DNA during incubation 
(prewash A647), immobilized TFs (eGFP), and TF-bound DNA after washing 
(postwash A647). Bottom right shows representative concentration-dependent 
binding for DNA sequences containing an extended E-box surrounded by either 
repetitive (red) or random (gray) flanks. (D) Measured AAG values across all Library 
1 sequences for Pho4 (left) and MAX (right). AAGs were calculated relative to the 
overall median value for oligonucleotides bearing an E-box consensus surrounded by 
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random flanking sequence. Light gray dots show all measurements; red circles 
indicate median values per oligo. (E) Median values (black markers and box plots) 


for all sequences 


containing either repetitive (red) or random (gray) flanking 


sequences for Pho4 (left) and MAX (right). (F) Library 2: 10 DNA sequences 


containing a cen 


ral extended (10-bp) E-box motif surrounded by 60 bp on either 


side of listed homopolymeric, dinucleotide, or tetranucleotide repeats. (G) Measured 
AAG values across all Library 2 sequences for Pho4 (gold) and MAX (blue). AAGs 


were again calcu 
bearing an E-box 


ated relative to the overall median value for oligonucleotides 
consensus surrounded by random flanking sequence. Gray bars 


indicate magnitude of effects predicted by PSAMs. (H) Observed effects on AAG for 


mutating single n 


ucleotides within the CACGTG core E-box (core) (47) versus 


altering flanking sequence within Library 2 (distal) for Pho4 (left, gold) and MAX 


(right, blue) over! 


aid on boxplots (gray). 
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Magnitude of STR effects on affinity depends on 
STR sequence 

To determine how STR sequence alters bind- 
ing, we designed a DNA library containing 
either an extended consensus E-box motif or 
a random sequence surrounded on each side 
by 60-bp flanks (the approximate mean length 
of STRs in humans and budding yeast; fig. S4) 
composed of random sequence or homopolymer, 
dinucleotide, or tetranucleotide STRs (Li- 
brary 2; Fig. IF and table S6; CG/AT indicates 
a CG repeat on one side of the motif and an 
AT repeat on the other). Because repetitive 
sequence extension can be technically chal- 
lenging, we visualized extension through de- 
naturing gel electrophoresis (fig. S29) and 
quantified binding affinities only for sequences 
that extended successfully (Fig. 1G, figs. S30 to 
$36, and table S6). The observed effects ranged 
from increasing affinity by 1.7 kcal/mol (18-fold) 
to reducing affinity by 0.8 kcal/mol (4-fold). 
Whereas ATGC STRs enhanced binding for 
both Pho4 and MAX, other STRs (AT/AT, 
ATCG/ATCG, and AG/CT) were deleterious 
for MAX only (Fig. 1G, figs. S37 and $38, and 
table S7). As for Library 1, the distribution of 
measured AAGs for multiple sequences with 
STRs differed significantly from those with 
different random sequences (fig. S39 and table 
S8), results did not change with surface pro- 
tein density (figs. S40 and S41), no sequence- 
specific binding was detected for an eGFP-only 
negative control (fig. S42), and the observed 
effects were inconsistent with PSAM-based 
models of specificity (Fig. 1G). Effects also di- 
verged significantly for Pho4 and MAX (Fig. 
1G), signifying that “consensus” binding mo- 
tifs are insufficient to predict STR preferences. 
Energetic contributions of flanking sequences 
approached or exceeded those associated with 
mutating core consensus residues (47), partic- 
ularly for MAX, suggesting that STRs could 
play a significant role in proper TF localization 
in vivo (Fig. 1H). 


STRs alter affinities by directly binding TFs 


The observed STR effects suggest two possi- 
ble mechanistic models (Fig. 2A). In the first, 
STRs could enhance TF binding to the core 
consensus site, perhaps by altering local DNA 
“shape” (56-60) (Fig. 2A, top). This model pre- 
dicts that STRs should only alter binding in 
the presence of a core motif and that TF-DNA 
stoichiometry should not depend on flanking 
sequence. The second model is that STRs could 
represent additional binding sites (Fig. 2A, 
bottom). This model predicts that STRs should 
enhance binding regardless of whether they 
flank a consensus motif and that multiple TFs 
will bind oligonucleotides containing STRs. 
Concentration-dependent binding for Pho4 
and MAX was clearly stronger for sequences 
containing favorable STRs even in the absence 
of a motif (Fig. 2, B and C). Moreover, ener- 
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getic effects of STRs did not correlate with 
predicted DNA shape parameters (figs. S43 
and S44), and circular dichroism spectroscopy 
ruled out enhanced binding resulting from 
STR-dependent structural transitions between 
B- and Z-form DNA (fig. S45). Finally, electro- 
phoretic mobility shift assays (EMSAs) using 
Alexa Fluor 647-labeled double-stranded DNA 
(dsDNA) and increasing concentrations of eGFP- 
tagged (Fig. 2D and fig. S46) or untagged MAX 
(fig. S47) revealed supershifted bands at higher 
MAX concentrations for DNA sequences con- 
taining STRs, consistent with multiple TFs 
binding a single DNA molecule. Together, these 
experiments demonstrate that STRs modu- 
late TF-DNA affinity by directly binding TFs 
in vitro. 


Statistical mechanical models integrating data 
across experimental platforms accurately 
predict STR effects 


Universal protein-binding microarray (uPBM) 
experiments measure binding of fluorescently 
tagged TFs to surface-immobilized DNA du- 
plexes containing all possible 8-mer DNA 
sequences, providing comprehensive measure- 
ments of TF-DNA specificity in an alternate 
(flipped) experimental configuration relative 
to MITOMI (6/7-64). To determine whether 
previously published uPBM measurements 
also reveal enhanced binding of Pho4 and MAX 
to specific STRs, we calculated the median in- 
tensity for all probes containing each of the 
65,538 possible DNA 8-mers and then calcu- 
lated a Z score for each 8-mer relative to this 
distribution (Fig. 2, E and F). As expected, 
probes containing 8-mer variants of the known 
E-box CACGTG consensus were bound very 
strongly by Pho4 and MAX, with Z scores of 
40 to 80 (Fig. 2F). Consistent with MITOMI 
results, favorable repeats were bound statis- 
tically significantly above background after 
Bonferroni correction for both MAX (ATGC, 
Z=151,P=4x10' CG,Z=83,P=5x10™; 
and AC, Z= 5.0, P=1x 10”) and Pho4 (ATGC, 
Z=10.7,P=7x10”; GC,Z= 3.9, P=3x10"; 
and AC, Z = 5.4, P = 9 x 10°°; Fig. 2F). 

Next, we combined information from the 
PBM and MITOMI experiments to determine 
whether partition function models from sta- 
tistical mechanics improve binding predictions 
by accurately accounting for flanking sequence 
effects (Fig. 2, E and G). MITOMI-measured 
AAGs and log-transformed gcPBM intensities 
for MAX binding to 32 probes (figs. S48 to S51 
and tables S9 and S10) were strongly anticor- 
related (Rp = 0.89) over a wide dynamic range 
(2.5 kcal/mol) (fig. S52), confirming reports 
that PBM intensities can report on affinities 
(62, 65-68). This allowed us to compute a par- 
tition function from intensities and predict 
Pho4 and MAX AAGs for all DNA Library 1 and 
2 sequences (Fig. 2G; see the materials and 
methods). For DNA Library 1, which contains 
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intact, mutated, or ablated E-box consensus 
sequences surrounded by 13-bp variable flank- 
ing sequences, partition function-based pre- 
dictions significantly improve agreement with 
measured AAGs over standard PSAM predic- 
tions (R,” = 0.91 versus 0.66 and R,” = 0.93 
versus 0.74 for Pho4 and MAX, respectively) 
(fig. S53). For DNA Library 2, in which all se- 
quences contain an E-box but differences in 
flanking sequences can change measured AAGs 
by up to 1.6 and 2.5 kcal/mol for Pho4 and 
MAX, partition function-based calculations 
were substantially better correlated with mea- 
surements than PSAM models (R,” = 0.71 ver- 
sus 0.09 and R,” = 0.81 versus 0.02 for Pho4 
and MAX, respectively) (Fig. 2G and fig. S54). 
Returned fit parameters from these linear re- 
gressions allow calibration of partition function- 
based predictions in energetic space with as 
few as nine thermodynamic measurements | 
(Kas or AAGs; fig. S55; see the materials and 
methods). 

To determine whether sequencing-based se- 
lection experiments also reveal binding for 
MAX to the same STRs, we quantified the fre- 
quency with which each 8-mer DNA sequence 
appeared within the TF-bound fractions in the 
SMiLE-seq (69) and SELEX-seq (70) datasets, 
converted frequencies to Z scores, and again 
used these Z scores in a partition function to 
predict binding to DNA Library 1 and 2 se- 
quences. Predicted binding was well corre- 
lated with observations for both libraries (R,” = 
0.86, 0.75 for Library 1 and R,” = 0.59, 0.81 
for Library 2 for SMiLE-seq and SELEX-seq, 
respectively; figs. S56 and S57). 


Even weakly preferred STRs enhance binding by 
increasing the number of preferred microstates 


Preferred repeats for Pho4 and MAX (e.g., CG 
and ATGC) do not resemble the known E-box 
consensus, as evidenced by a failure of PSAM- 
based models to predict the observed effects 
(Figs. 1H and 2G and figs. S28 and S53). Why, 


then, do repeats recruit TFs? By virtue of being - 


repetitive, STRs create multiple identical bind- 
ing sites that are equally probable binding 
microstates (Fig. 2H), and STRs (in particular, 
homopolymers) maximize binding entropy and 
therefore minimize Gibbs free binding energy 
when enthalpy is kept constant (see the sup- 
plementary text). To estimate the energetic 
magnitude of this statistical effect, we con- 
ducted Monte Carlo simulations that random- 
ly sample from observed energy distributions 
to mimic either random or homopolymeric 
sequences (fig. S58; see the materials and 
methods). These simulations revealed that in- 
creasing repetitiveness alone can contribute 
up to 0.3 kcal/mol mean binding energy through 
entropic effects for sequences <60 bp (fig. S58). 
However, effects are considerably stronger for 
STRs with affinities only slightly above back- 
ground binding: 57-bp dinucleotide STRs with 
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Fig. 2. STRs are directly bound by TFs with observed affinities that can be 
accurately predicted by statistical mechanics. (A) Models explaining how 
repetitive flanking sequences could enhance TF binding affinities. (B) Representative 
concentration-dependent binding for Pho4 (left) and MAX (right) interacting with 
DNA sequences containing either repetitive (red) or random (gray) sequences in the 
absence of an E-box motif. (€) Box plots of relative binding energies (AAGs) for 
Pho4 and MAX binding to oligonucleotides with repetitive (red) or random (gray) 
sequence flanking an extended E-box consensus (dark gray) or random sequence 
(light gray); black and red dashed lines indicate median overall affinities. (D) EMSAs 
for increasing concentrations of eGFP-tagged MAX interacting with Alexa Fluor 
647-labeled dsDNA duplexes containing a central extended E-box surrounded by 
random (left) or repetitive (right) sequences. Blue boxes highlight TF complexes 


bound to the core motif; red boxes highlight supershifted species with additional 
bound TFs. Native gel electrophoresis reveals MAX alone runs as three bands, likely 
representing MAX homodimers, MAX monomers, and eGFP-only truncation 
constructs (fig. S46). (E) Pipeline for calculating 8-mer intensity Z scores from 
universal PBM data and calibrating partition function scores to predict binding 
(see the materials and methods). (F) Log-linear histograms of intensity Z scores for 
all 8-mers for Pho4 (left) and MAX (right). Inset shows linear-linear plots that 
highlight background binding distributions and Z scores of the STRs measured in this 
study (red bars, top). (G) Scatter plots, linear regressions, and correlation 
coefficients for measured AAGs versus calibrated partition function—predicted 
scores across all measured repeats for Pho4 (left) and MAX (right). (H) Schematic 
showing possible microstates as a function of sequence. 


intensity Z scores of 1 to 2 or 5 to 10 are pre- 
dicted to enhance binding by 0.6 and 1.4 kcal/ 
mol (10-fold), respectively (fig. S58). Further 
validating these effects, partition function- 
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predicted energy distributions for Pho4 and 
MAX binding 10,000 simulated sequences con- 
taining E-box consensus sites flanked by either 
random sequence or STRs showed that the 
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most-favorable STRs bound more strongly than 
all 10,000 random sequences (fig. S59), a result 
not predicted by analogous simulations using 
mononucleotide models (fig. S60). A partition 
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function model is purely additive, and addi- 
tional mechanisms of cooperativity [e.g., al- 
lostery, avidity, and allovalency (71)] are not 
necessary to explain effects of STRs on in vitro 
binding. 


STRs are directly bound by TF DNA-binding 
domains 


Our results thus far had established that TFs 
directly bind STRs but did not identify which 
portion of the TF recognizes them. STRs may 
be recognized by intrinsically disordered re- 
gions (IDRs) outside of TF DNA-binding do- 
mains (DBDs) (72) or by DBDs themselves. To 
distinguish between these, we compared bind- 
ing for eGFP-tagged full-length Pho4, the DBD 
alone, or the non-DBD alone to six DNA se- 
quences containing either the extended E-box 
motif (GTCACGTGAC) or no motif surrounded 
by random sequence or favorable or moder- 
ately favorable STRs (ATGC and CG/AT, respec- 
tively; Fig. 3, A to C; figs. S61 to S63; and 
tables S11 and S12). Both full-length and DBD- 
only constructs showed enhanced binding to 
repeats (Fig. 3, A to C, and figs. S61 and S62) 
with strongly correlated measured Ka values 
(Ry = 0.99; Fig. 3C). Consistent with prior re- 
ports that IDRs outside of the DBD can inhibit 
DNA binding (73-77), fluorescence intensity 
ratios (DNA bound per surface-immobilized 
TF) were consistently lower for the full-length 
construct (Fig. 3B). By contrast, the Pho4 non- 
DBD did not bind DNA with either random or 
the most-favorable ATGC STR flanking se- 
quences above background levels, and mea- 
sured Kas were uncorrelated with the full-length 
construct (Fig. 3C and fig. S63). Although the 
Pho4 non-DBD exhibited detectable binding 
to the moderately favorable CG/AT STR (fig. 
S63), binding was extremely weak (Kg > 15 uM) 
and disappeared in the absence of a CACGTG 
motif (fig. S63), inconsistent with observations 
for full-length Pho4 (Fig. 2, B and C). 

Because MAX has an extremely small non- 
DBD (49 residues versus 202 residues for Pho4; 
fig. S64A), we anticipated the MAX non-DBD 
was unlikely to bind STRs. To test this, we 
compared 8-mer Z scores between previously 
published uPBM (62) and SELEX-seq (70) data 
for full-length MAX and the DBD alone (fig. 
S64). If the MAX non-DBD binds STRs, then 
we would expect the full-length construct to 
return higher Z scores for favorable STRs com- 
pared with the DBD alone. Instead, all 8-mer 
Z scores were linearly correlated between con- 
structs (R? = 0.72 and 0.90 for uPBM and 
SELEX-seq, respectively). Together, these analy- 
ses demonstrate that Pho4 and MAX recognize 
STRs through their DBDs. 

To investigate which residues within the Pho4: 
DBD mediate STR recognition, we used STAMMP 
(50) (Fig. 3, D and E) to recombinantly express 
and purify 221 Pho4 variants containing sys- 
tematic amino acid mutations within and 
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surrounding the DBD (Fig. 3F and table S13) 
and quantify concentration-dependent binding 
for each variant interacting with DNA sequences 
containing a motif flanked by either random 
sequence or favorable CG dinucleotide STRs 
(Fig. 1G). Across nine STAMMP experiments, 
214 of 221 variants showed strong expression 
(Fig. 3E, figs. S65 and S66, and table S14), and 
concentration-dependent binding was well fit 
by a Langmuir isotherm across both DNA se- 
quences (Fig. 3G and figs. S67 to S72), yielding 
6139 individual TF-DNA Kg measurements. 
After normalization between experiments, mea- 
sured energetic effects were consistent across ex- 
periments (<0.48 kcal/mol RMSE) and spanned 
>4 kcal/mol (figs. S69 and $72). 

We then compared measured AAGs for each 
mutant relative to the wild-type (WT) TF across 
DNA sequences, reasoning that residues involved 
in STR recognition should differentially affect af- 
finity upon mutation (Fig. 3H). Nearly all mu- 
tants altered binding affinities equally across 
DNA sequences, but E259D showed signifi- 
cantly enhanced binding to CG dinucleotide- 
flanking sequences (Fig. 3H and figs. $73 and 
S74; Z score of residual = 6.0, P = 1.7 x 10°, 
AAAG = 0.73 Keal/mol). In the Pho4 crystal 
structure, E259 directly contacts nucleotides 
from both strands at the CACGTG position 
(78) (Fig. 31), and comparisons of measured 
affinities for WT Pho4 and E259D revealed that 
although the WT Pho4 showed a strong pref- 
erence for the canonical E-box motif (CACGTG), 
E259D showed equal, weak (100-fold lower) 
binding to the canonical E-box and a motif 
mutated at this position (CACGCG) (50) (Fig. 
3J). These observations are consistent with a 
model in which increased promiscuity of the 
E259D binding energy landscape leads to an 
effective increase in preference for CG dinu- 
cleotide repeats (Fig. 3K). 


STRs increase apparent macroscopic 
association rates 


To investigate how flanking sequences alter 
TF binding kinetics, we leveraged k-MITOMI 
(49) (Fig. 4A) to quantify dissociation rates 
for Pho4 and MAX interacting with DNA se- 
quences containing an extended E-box motif 
(GTCACGTGAC) surrounded by 60-bp flanks 
composed of random sequence or eight dif- 
ferent STRs that extended properly (homo- 
polymer: A/A; dinucleotide: AT/AT, AG/CT, 
GT/AC; tetranucleotide: ACGT/ACGT, ATCG/ 
ATCG, ACTG/AGTC, ATGC/ATGC). Specifi- 
cally, we iteratively (i) closed valves to trap 
TF-bound DNA, (ii) introduced a high-affinity 
unlabeled DNA competitor, (iii) opened valves 
for 1 to 4 s to allow fluorescently labeled DNA 
to dissociate, (iv) closed valves and washed out 
unbound material, and (v) imaged all device 
chambers (Fig. 4A). Excess unlabeled DNA 
competitor outcompeted rebinding to ensure 
accurate rate measurements. Decreases in mea- 
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sured Alexa Fluor 647/eGFP (DNA/TF) inten- 
sity ratios over time were well fit by a single 
exponential for Pho4 and MAX [Fig. 4B and figs. 
$75 and S76; see additional data at (55)]; rates 
typically varied by <3-fold across experiments 
before normalization (figs. S77 and S78). For 
both Pho4 and MAX, different 60-bp flanking 
STRs changed apparent rates of dissocia- 
tion (Kofcapparent) from the entire dsDNA se- 
quence only slightly (<1.7-fold, less than noise 
between experiments) (Fig. 4C and figs. S77 
and S78). By contrast, inferred apparent on 
rates (Kon,apparent = Kott,apparent/ Ka, calculated 
assuming a two-state model in which DNA is 
either bound or unbound) were substantially 
altered (Fig. 4C and figs. S79 to S83). These 
results were consistent across different nor- 
malization schemes (figs. S84 to S88), with fa- 
vorable STRs increasing macroscopic apparent 
on-rates by 7- to 54-fold for Pho4 and MAX, re- _ 
spectively, suggesting that the observed changes 
in affinity were primarily caused by altered 
macroscopic apparent association rates (Fig. 
4C and fig. S87). 


STRs increase the density of weakly bound TFs 
near target motifs 


STRs are enriched near binding sites of stress- 
response TFs in budding yeast that likely 
require a rapid transcriptional response (3), 
suggesting that STRs could reduce search 
times in vivo. To model how changes to motifs 
and flanking sequences alter TF search be- 
havior, we expanded our two-state model (in 
which a single TF is either bound or not bound 
to any location within the DNA; Fig. 2A) to 
a four-state continuous-time Markov Chain 
(CTMC) model in which a single TF may be 
(i) free (nonspecifically diffusing in the nucleo- 
plasm), (ii) testing (near DNA or nonspecifi- 
cally bound to DNA), (iii) bound to a motif, or 
(iv) bound to the flanks (Fig. 4D; see the mate- 
rials and methods). The rate constant for tran- 
sitioning between the free and testing states is 
given by Kon max (the theoretical upper bound - 
for the on-rate if all nonspecific TF-DNA inter- 
actions result in specific binding); rate con- 
stants for transitioning from the motif- or 
flank-bound state to the testing state are given 
by Koteumotit ANd Kofeypank; and the probabil- 
ities of transitioning to the motif or flanks 
depend on the likelihood of binding either 
sequence (ffank OF fmotiff) and on the rate at 
which TFs transition from the testing state 
back to the free state (Kor). Together, this 
yields a simple expression for the transition 
probability from the testing state to either the 
flank or motif (Presting,x =f/(. + faank + Smotit); 
x € {flank,motif}). Assuming that the time 
spent in the testing state is negligible, this four- 
state model can determine these microscopic 
rate constants from macroscopic measurements 
of affinities and apparent dissociation rates 
for sequences containing a consensus E-box, 
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Fig. 3. Mutations within TF DBDs alter repeat sensitivity. (A) Schematic behavior. (E) Example zoomed-in fluorescence images showing immobilized TFs and 
illustrating MITOMI experiment quantifying binding of full-length, DBD-only, and non- —_concentration-dependent DNA binding. (F) Schematic of C-terminally eGFP-tagged 
DBD-only Pho4 constructs to DNA sequences either containing or lacking motifs Pho4 and location of scanning mutants. (G) Example concentration-dependent 
surrounded by either random sequence or favorable STRs. bHLH indicates the bHLH binding measurements and Langmuir isotherm fits for WT Pho4 and two mutants 
DBD within Pho4. (B) Measured concentration-dependent binding for full-length, (L270V and R263L) interacting with “Motif + random 1” (H) Effects of TF mutations 
DBD only, and non-DBD-only Pho4 constructs. Markers denote measured intensities on relative DNA-binding affinity for an extended E-box consensus flanked by CG 
from individual chambers; lines indicate Langmuir isotherm fits. (€) Scatter plots repeats versus random sequence. Black dashed line indicates 1:1 relationship, 
comparing measured Kys for DBD-only Pho4 versus full-length Pho4 (left) and red dashed line indicates linear regression, and color bar indicates Z score of 
non-DBD-only Pho4 versus full-length Pho4 (right). Marker bars indicate mean residuals from linear regression. (I) Zoomed-in crystal structure showing contacts 
across all chambers, error bars indicate SD, dashed line indicates linear regression, | between the WT E259 and E-box consensus (PDB ID: 1aQa). (J) Affinities for 
and P values indicate the significance of correlation. (D) Experimental pipeline Pho4 WT and E259D mutants interacting with consensus E-box and five single- 
for STAMMP illustrating steps for recombinant protein expression, surface nucleotide substitutions. (K) Reaction coordinate diagram of binding specificity 
immobilization, purification, and measurement of concentration-dependent binding landscapes for Pho4 WT and E259D. 


a weak E-box, or a scrambled sequence sur- | S89 to S97; see the materials and methods). | sequences with a consensus E-box or a weak 
rounded by 13-bp flanks composed of either | Consistent with recent work on FE. coli LacI | E-box were similar, but affinities and micro- 
GT/AC or CG/AT dinucleotide repeats or ran- | binding to various operator sequences (79), | scopic association likelihoods differed by 12- 
dom sequence (DNA Library 1; Fig. 4E and figs. | mean microscopic dissociation rates (A o¢,,) for | or 16-fold (figs. S98 to S100). Systematically 
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Fig. 4. Repetitive flanking sequences increase macroscopic association rates 
and reduce mean first passage time. (A) Experimental pipeline for k-MITOMI 

(see the materials and methods). (B) Example dissociation curves for MAX 
interacting with DNA Library 2 sequences showing per-chamber measurements 
(markers), per-chamber single-exponential fits (lines), and the average of returned 
fit parameters (annotation) for each sequence. (C) Measured Kofrapparent (left) 

and calculated Kon,apparent (right) values as a function of flanking sequence for Pho4 
(yellow) and MAX (blue) interacting with DNA Library 2 sequences (all of which 
contain a core motif). (D) Proposed four-state model and associated microscopic 
rate constants for TF binding to sequences with a central core motif surrounded by 
different flanking sequences. (E) Average measured Koftapparent (circle markers, 

left axis) and calculated kon apparent (diamond markers, right axis) values versus 
measured affinities (Kgs) for Pho4 (yellow) and MAX (blue) interacting with all 
sequences from DNA Library 1. (F) Sample TF trajectories from Gillespie simulations 
modeling 2600 TFs interacting with a single DNA sequence containing a consensus 
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motif flanked by either repetitive (top, red) or random (bottom, gray) flanks. DNA can 
be unbound, associated with TFs in a “testing” state, bound by a TF at the motif, 
bound by a TF at the flanking sequence, or bound by TFs at the motif and flanking 
sequence simultaneously. (G) Log-linear distribution of TF dwell times across 1000 
simulations for sequences with a consensus motif flanked by CG repeats, GT repeats, 
or random sequence. Inset shows mean dwell times by sequence. (H) Log-linear 
distribution of the lengths of time a DNA sequence is bound by at least one TF across 
1000 simulations for sequences with a consensus motif flanked by GC repeats, 

GT repeats, or random sequence. Inset shows mean time occupied by sequence. 

(I) Mean first passage time {black markers, left axis; units relative to fastest possible 
search time, 1/[Konmax*(TF)]}, mean motif occupancy (blue markers, right axis), 
mean flank occupancy (red markers, right axis), and mean total DNA occupancy 
(purple markers, right axis) as a function of the likelihood of binding flanking sequence. 
Gray box indicates the range of affinities for random flanks; pink and red boxes 
correspond to fran, Values for GT and CG repeats, respectively. 
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quantifying how changes in microscopic rate 
constants affect macroscopic observables (Ka 
and Koteapparent) reveals combinations that can 
differentially affect affinity and kinetics [e.g., 
altering Konmax does not change overall dissocia- 
tion rates but can alter affinity when microsco- 
pic dissociation from the motif (Korey moti) 1S 
slow; figs. S101 and S102]. Because fitted mi- 
croscopic rate parameters are often found at 
locations in phase space where concomitant 
variation in two parameters differentially tunes 
binding (figs. S101 and S102), STRs may max- 
imize regulatory tunability (32, 80-83). 

Using these microscopic rate parameters in 
Gillespie stochastic simulations to predict be- 
havior for 2600 TFs [the estimated number of 
Pho4 copies in S. cerevisiae (84)] binding DNA 
within the yeast nucleus yielded individual TF 
trajectories that recapitulated the observed 
experimental trends (fig. S103) and showed 
that sequences with favorable flanking STRs 
were frequently occupied by multiple TFs (Fig. 
4¥F and figs. $104 and $105). Although the DNA 
dwell time for any individual TF was largely 
independent of flanking sequence identity 
(Fig. 4G and fig. S106A), as expected with the 
absence of an observed macroscopic off-rate 
effect (Fig. 4, C and E, and fig. S87), DNA se- 
quences with preferred flanking STRs were 
occupied by at least one TF for substantially 
more time (Fig. 4H and fig. S106B). Mean be- 
havior across 100 simulations showed that as 
the relative affinity for flanking STRs increased, 
total DNA occupancy increased, creating a lo- 
cally concentrated pool of TFs (Fig. 41 and fig. 
$106C). Although variations in the relative 
motif/flank affinity ratio did not affect the 
mean first passage time (MFPT) to the motif 
(the mean time for a TF to move from the free 
state to the motif state) or motif occupancy (as 
expected), changing this ratio altered the ef- 
fective TF concentrations at which flanks were 
occupied (figs. S107 and S108). Even for this 
simple model that does not consider proximity 
between the motif and flanks, favorable STRs 
thus reduce MFPT to the entire DNA sequence 
(motif and flanking sequences) (Fig. 4I and 
figs. S109 and S110), consistent with a hypothe- 
sized role for STRs in regulating stress re- 
sponses (3) and with previous work showing 
that favorable STRs can act as “antennae” to 
enhance TF target search (85). 


STRs alter gene expression by tuning TF 
occupancies in vivo 


Although STRs have repeatedly been associ- 
ated with changes in gene expression in cells, 
and the length of STRs in the genome exceeds 
the length required for an in vitro effect (figs. 
S4 and $23), our results thus far did not elu- 
cidate whether STRs alter TF occupancies 
in vivo. Directly quantifying the impacts of 
STRs on TF binding in cells is technically chal- 
lenging, because the lower-affinity binding ex- 
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pected for STRs is unlikely to yield distinct 
peaks within chromatin immunoprecipitation 
data. To sensitively quantify effects of STRs 
in vivo, we trained the BPNet (37) neural net- 
work (NN) on in vivo chromatin immunopre- 
cipitation sequencing (ChIP-seq) data with 
5-fold cross-validation to predict TF binding 
profiles from DNA sequence with nucleotide 
resolution and then applied Affinity Distillation 
(AD) (86) to predict log-transformed mean read 
counts [Alog(counts)], which were previously 
shown to correlate with measured thermody- 
namic energies (AAGs). If STRs alter gene ex- 
pression in vivo by changing TF occupancies, 
then we would expect BPNet to learn that they 
affect TF binding and AD to predict sequence- 
dependent read count changes that mirror 
AAGs measured in vitro. 

After training on high-quality MAX ChIP-seq 
data (87, 88) (Fig. 5A), BPNet accurately pre- 
dicted log-transformed read counts for held- 
out data (R? = 0.52), with binding profiles that 
reproduced those observed experimentally 
(Fig. 5A) (86). Returned contribution weight 
matrices (CWMs), which identify short sub- 
sequences most predictive of TF binding, 
revealed E-box-like motifs (CACGTG) that 
sometimes included a flanking preference for 
CG dinucleotides, consistent with in vitro pref- 
erences (Figs. 5A, 1G, and 2, G and H). Some 
CWMs also included an AP1-binding motif 
(TGACTCA), consistent with AP1 acting as a 
pioneer factor to increase chromatin acces- 
sibility for MAX (Fig. 5A) (89). AD-predicted 
8-mer Z score distributions showed higher 
correlation with distributions calculated from 
uPBM data relative to either mononucleotide 
or dinucleotide models (Ry = 0.42, 0.21, and 
0.22, respectively), likely because of an enhanced 
ability to accurately predict low-affinity inter- 
actions (figs. S111 to S114). AD-predicted log- 
transformed read counts for DNA Library 1 se- 
quences also strongly correlated with measured 
AAGs (R? = 0.78) and partition function- 
predicted binding energies [Fig. 5B and figs. 
S115 to S118; see additional data at (55)]. AD 
consistently predicted tighter binding to con- 
sensus motifs flanked by preferred STRs (Fig. 
5B), and importance scores from DeepSHAP 
(90, 91), which identify base pair contributions 
to the observed model output, confirmed that 
enhanced binding was caused by the flanking 
STRs in these synthetic sequences (Fig. 5, C 
and D, and figs. S119 to S121). Together, these 
analyses suggest that observed in vivo effects of 
polymorphic STRs on gene expression can be ex- 
plained at least in part by differential TF binding. 


STR impacts extend over tens of nucleotides 
and mismatches reduce effects 


To determine the distance over which STRs 
affect binding, we quantified MAX binding af- 
finities for DNA containing an E-box motif 
surrounded by increasing lengths (15, 30, 45, 
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or 60 bp) of either disfavored (AG/CT) or fav- 
ored (GT/AC) repeats using MITOMI (table 
S6). In parallel, we used AD to predict MAX 
occupancies and binding profiles for the same 
sequences (Fig. 5, E to G, and fig. S122). For 
disfavored AG/CT repeats, both MITOMI and 
AD revealed that increasing STR lengths mo- 
notonically reduced binding, with effects sat- 
urating after ~40 bp (Fig. 5G; R? = 0.80 
between predictions and measured AAGs). 
Returned DeepSHAP interpretations and cu- 
mulative importance scores confirmed a neg- 
ative contribution from flanking STRs (Fig. 5, 
F and H). Favored GT/AC repeats showed 
more complex behavior, with short repeats 
(15 to 30 bp) increasing binding and longer 
repeats having only minor effects, but predic- 
tions were again consistent with experimental 
observations (fig. $122; R® = 0.93). 

Nearly 80% of repeated units within the 
median human STR match the consensus 
repeat exactly, with the remaining 20% con- 
taining an indel or mismatched base(s) (see 
the materials and methods). To investigate how 
imperfections within STRs alter binding, we 
applied MITOMI and AD to measure and pre- 
dict MAX binding to seven increasingly scrambled 
(GT/AC) repeat sequences (Fig. 51). Even though 
the relationship between measured affinities 
and repeat imperfection (as quantified by 
Shannon entropy) was nonmonotonic, AD accu- 
rately predicted energetic measurements (R? = 
0.84), suggesting that the algorithm had learned 
that the increased multiplicity of even weakly 
preferred STRs enhances binding and that ener- 
getic impacts depend not only on nucleotide 
composition but also on repeat imperfection 
(Fig. 5, J and K). 


TF binding to STRs is widespread across 
structural families and organisms 


To determine whether STR binding is specific 
to Pho4 and MAX or more widespread, we 
analyzed PBM data for 1291 TFs from 114 spe- 
cies, including S. cerevisiae, Arabidopsis thaliana, 
D. melanogaster, Caenorhabditis elegans, 
M. musculus, and H. sapiens (61, 64, 92) (table 
S15). For each experiment for each TF, we 
iterated through all 65,536 (4°) 8-mers, com- 
puted median intensities for all probes con- 
taining each 8-mer, and calculated Z scores 
relative to this distribution for all 39 non- 
redundant homopolymeric, dinucleotide repeat, 
and tetranucleotide 8-mer STRs (Fig. 6A, figs. 
$123 and S124; see the materials and meth- 
ods). TF preference for STRs was ubiquitous, 
with 90% (1158/1291) of all TFs binding at least 
one STR with P < 13 x 10°? (the Bonferroni- 
corrected threshold for significance; figs. S125 
and S126 and table S15), and STR preferences 
varied widely across TF families (figs. $127 to 
$143). Some families (e.g., nuclear hormone 
receptors, T-box, and bZIP) show little prefer- 
ence for any STRs, whereas others (e.g., AT 
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Fig. 5. NN models trained on in vivo datasets recapitulate repeat effects 
observed in vitro and return predictions similar to statistical mechanics 
models. (A) Experimental pipeline: AD NN models trained on MAX ChIP-seq data 
predict base pair—resolution binding profiles and return hypothetical CWMs 
representing binding preferences. Positive and negative numbers represent 
nucleotides that favor and disfavor binding, respectively. (B) AD-predicted 
binding [Alog(counts)] versus MITOMI-measured AAGs for 26 DNA sequences 
containing an intact motif, a mutated motif, or scrambled sequence surrounded 
by either repetitive (red markers) or random (gray markers) flanking sequence. 
(C) DeepSHAP interpretations for a motif surrounded by a favored repeat (CG, 
top), a disfavored repeat (GT, middle), or random sequence (bottom). The sum 
of importance scores across a sequence are equal to the count prediction output 
of the NN. (D) Cumulative importance scores as a function of position for a 
favored repeat (CG, dark red), a disfavored repeat (GT, light red), or random 
sequence (gray). Gray box indicates motif location. (E) Schematic of sequences 
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with E-box and 15, 30, 45, or 60 bp of disfavored AG/CT repeats. (F) DeepSHAP 
interpretations for 15- and 60-bp sequences from (E). (G) AD-predicted change 
in log(counts) (blue line, left axis) and -1*MITOMI-measured AAGs (blue 
markers, right axis) as a function of repeat length (relative to a sequence with a 
motif and random flanks). Markers and error bars show median and SD across 
replicates, respectively. (H) Cumulative importance scores as a function of 
position for sequences with E-box and 15, 30, 45, or 60 bp of AG/CT repeats. Gray 
box indicates motif position. (I) Schematic of sequences with E-box and 
increasingly scrambled GT/AC repeats. (J) AD-predicted change in log(counts) 
(blue line, left axis) and -1*MITOMI-measured AAGs (blue markers, right 

axis) for sequences shown in (I) calculated relative to reference sequence. 
Color indicates Shannon entropy. Markers and error bars show median and 

SD across replicates, respectively. (K) Cumulative importance scores as a 
function of position for reference sequence and sequences 6 and 7. Gray box 
indicates motif position. 


9 of 16 


RESEARCH | RESEARCH ARTICLE 


=, mT ofc, 
= 


= | | | 
— wlll Wd) Tl 
al mi ee | ae 
- i. 1H | 
| wal ee wi | 
[| ny TT eT wi 
yt | ih ae Hew 
|) 000) 0 oe UH, A) 1 ab i Mie 
-— le ith | mn | hil | | ih yi 
= | | I | 
—__ (Mi " i 
1 {I} 
lo ii . “em 0m | 
—_ I | eg nl 
ci | it rine i 
Hn a PEM WL in ll Ta 
= Welt me. | 
He ee EWA i i Le nh Mad’ |) 
tL 1 | [ae 
= | 
Le I) i] evan an a! 
= Wil i Siam We 
R =-0.79% 1 number of TFs 
D 20/F as ATM | AT hook E 0 100 200 
i. %¢ NARID/BRIGHT aa? || 
sis pou| ccc] 
Arid 1A what ) mane |i 
: y , AAAA 
20 R, = -0.7033 [pemeodomain a ii] 
H a bys Ap. Forkhead 
© 10} ° * SeaTseT ER | GATA anaT 
8 Myb/SANT ea 
7) C2 . ly’ 
Re o care @3 sani — acaT {/ijij 
© oa. ee | cece? ] ape 
oe (et 
cS - | Wi) an DHLH acaci[f  m £2 
° faa | AP2 accc m Forthead 
= 
0 Chft 8 Be ’ yx WRKY AATC wa Homeo. 
_ZFP seat aN NT 
R.=-0.374 | tlt | =i Other 
2p MM T-box acte7 Epo 
02 AAAC | t= Sox 
10 OSes ! Ets & T-box 
5 a ee AAGT l = WRKY 
C) © Zinc tinger 
0 Smee. arec{} = Sparta”? 
Pho4 > Other| ecce ry bZIP 
01234567 -1.0 -05 0.0 05 1.0 aacT | 
Levenshtein Spearman ather 
distance correlation (p) 


Fig. 6. Most TFs show statistically significant binding to repetitive sequences. 
ity Z scores for 1291 TFs (columns) 
undant STR types (rows; i.e., reverse complements are 
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correlation coefficients between repeat Z score and Levens 
consensus across 17 different TF structural families. (E) Bar plot showing 
the number of TFs that prefer a particular tetranucleotide repeat shaded by TF 
family. (F) Scatter plots, linear regressions, and correlation coefficients for measured 
AAGs versus summed Z scores (intensity-predicted binding) across all measured 
repeats for Pho4 (left) and MAX (right). (G) PSAM (left) and heat map showing 
8-mer Z scores for three NHR paralogs from M. musculus (Erro, ErrB, and Erry). 
(H) Pairwise comparisons of predicted binding (calculated by summing Z scores) 
for consensus motifs surrounded by 50 bp on either side of the tetranucleotide 
repeats to Erro, ErrB, and Erry. 
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Levenshtein distance; Figs. 6D and fig. $144). 
Across all TFs, AATT and CCGG repeats were 
the most preferred, largely because these STRs 
resemble known motifs for TFs in two most 


RESEARCH | RESEARCH ARTICLE 


abundant structural families [homeodomain 
and zinc finger TFs, respectively (93)] (Fig. 6E). 
C homopolymers were the most disfavored 
(fig. S145). 


Differential STR preferences could allow closely 
related paralogs to target distinct genes 


Many closely related paralogs with conserved 
DBDs and nearly identical consensus motif 
preferences bind and regulate distinct gene 
targets in vivo (94-96). This differential bind- 
ing has been attributed to either subtle dif- 
ferences in motif (65) or flanking nucleotide 
(57, 67, 97-99) preferences or direct binding by 
poorly conserved regions outside of the DBD 
(72). As an alternate hypothesis, differential STR 
preferences could influence paralog-specific 
localization. Global comparisons of preferred 
STRs and preferred motifs across paralogs 
within a species (quantified through cosine 
similarity; see the materials and methods) 
revealed many TF pairs with highly similar 
motifs but divergent STR preferences (figs. 
$146 to S151), particularly for DHLH and nu- 
clear hormone receptor (NHR) TF paralogs in 
A. thaliana and M. musculus (figs. S152 to S154). 

Uncalibrated summed 8-mer Z scores for 
Pho4 and MAX binding to DNA Library 2 se- 
quences correlated well with measured AAGs 
(R? = 0.66 and 0.71 for Phot and MAX, only 
slightly worse than for calibrated partition 
function-based predictions) (Fig. 6F), suggest- 
ing that existing PBM measurements can be 
used to estimate binding to arbitrary sequences 
even without quantitative affinity measurements. 
Predicted binding of the Erro, ErrB, and Erry 
NHR TFs from M. musculus (which have nearly 
identical motifs but distinct STR preferences) to 
sequences containing the consensus surrounded 
by 50 bp (on either side) of random sequence or 
STRs were poorly correlated (R” = 0.01, 0.34, and 
0.07, respectively; Fig. 6, G and H), consistent 
with the hypothesis that sensitivity to STRs 
could differentially localize paralogs. 


STRs are associated with active enhancers and 
high mutation rates 


STRs can enhance or decrease TF binding en- 
ergies; however, the lower bound of affinity 
imposed by nonspecific, electrostatic-mediated 
interactions skews STR effects to predomi- 
nantly enhance binding (fig. $124). Consistent 
with a primarily activating role, STRs are most 
enriched within the most active enhancers 
[Rs” = 0.67, as measured by CAGE-seq, p300 
ChIP, GRO-seq, or similar enhancer activity 
assays (100); control datasets shuffling en- 
hancer sequences and measured activity show 
no significant correlation (Rs? = 0.16)] (Fig. 
7A). STRs are also preferentially enriched in 
enhancers that are broadly active across 278 
human cell types (Rs” = 0.85); shuffled nega- 
tive control datasets show no enrichment (Rs” = 
0.02) (Fig. 7B). Across various eukaryotic ge- 
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nomes, mutations in STRs occur several orders 
of magnitude more frequently than short in- 
sertions and deletions (indels, 1 to 3 bp) and 
base substitutions (Fig. 7C), suggesting that 
STRs can provide an easily evolvable mech- 
anism to tune transcription (3, 12, 101). 


Discussion 


The role of STRs in transcriptional regulation 
has been thoroughly documented, yet the mech- 
anism by which they alter gene expression is 
poorly understood. Here, we present a model 
in which STRs directly bind TFs, thus estab- 
lishing STRs as a class of regulatory elements. 
Our model is consistent with prior work sug- 
gesting that STRs tune gene expression by mod- 
ulating nucleosome occupancy (3), because TF 
binding, especially that of pioneer factors, is 
the primary determinant of chromatin acces- 
sibility (02-105). However, this model allows 
for more sophisticated regulation: Rather than 
uniformly altering chromatin accessibility, STRs 
can differentially affect binding for even closely 
related TFs, serving as rheostats to precisely tune 
TF binding at a specific locus (81, 83, 106-108). 
Moreover, with relatively few types of STRs 
relative to the number of different TFs, STRs 
in the absence of known motifs can recruit a 
diverse set of TFs, thereby functioning as gen- 
eral regulatory elements, consistent with ob- 
servations that STR-enriched enhancers are 
broadly active across cell types (27) (Fig. 7). 
Finally, STRs need not surround a TF con- 
sensus motif to have a regulatory effect; rather, 
they may sequester TFs for precise temporal 
control of transcription, as is hypothesized for 
pericentromeric satellites regulating the tim- 
ing of chromosomal replication (109). 

In contrast to the canonical model that long 
residence times confer specificity and function 
whereas TF search is nonspecific and diffusion 
limited (110), we found that favorable STRs 
surrounding target motifs alter affinities pri- 
marily by increasing apparent macroscopic TF 
association rates. These results contradict prior 
measurements suggesting that DNA sequence 
variation primarily affects dissociation rates; 
however, prior experiments did not include un- 
labeled competitor DNA and therefore likely 
observed a convolved process of dissociation 
and rebinding (110, 111). Thus, we join other 
recent work in challenging the canonical view 
that protein-nucleic acid binding affinities are 
primarily determined by dissociation rates (79). 
Our measurements can be explained by a sim- 
ple four-state model showing that STRs en- 
hance affinities by increasing the rate of DNA 
association. This is consistent with prior work 
suggesting that degenerate recognition sites 
may serve as “DNA antennae” to attract TFs to a 
particular regulatory site (85, 112-115). This four- 
state model likely underestimates the true im- 
pacts of STRs on target search because it does 
not explicitly consider whether TFs can move 
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from flanking STRs to a central motif through 
one-dimensional sliding, hopping, and interseg- 
mental transfer (176-119), rather than dissociat- 
ing, diffusing, and rebinding. Future experiments 
will be required to deconvolve the kinetic con- 
tributions of nonspecific, electrostatic-mediated 
binding from other “testing” states for different 
TF structural classes, to quantify the effects of 
facilitated dissociation on the observed mac- 
roscopic and inferred microscopic parameters 
(111, 120, 121), and to develop more complex mod- 
els that consider the contribution of each micro- 
state to macroscopic kinetic parameters. 
Because eukaryotic TFs recruit transcrip- 
tional coactivators through “fuzzy,” multivalent 
(122-124), and allovalent (77) interactions, the 
finding that STRs enhance the local concen- 
tration and reduce mean first passage time near 
genomic target sites raises the intriguing pos- 
sibility that dense clusters of loosely bound TFs | 
could enhance the recruitment of coactivator 
proteins to ensure fast transcriptional response 
kinetics. This hypothesis is supported by the ob- 
servation that STRs in budding yeast are en- 
riched near binding sites of stress response TFs 
(3), for which a rapid transcriptional response 
may be especially advantageous. The smaller 
size and operon structure within bacterial and 
archaeal genomes suggests a reduced need to 
speed TF search. Consistent with this, bacterial 
TFs tend to bind long target motifs with high 
affinities (J08, 125, 126), and STRs comprise a 
smaller percentage of bacterial and archaeal 
genomes (table S16 and figs. $155 and S156). 
This case study of STRs further under- 
scores the limitations of motif-based models 
in predicting TF occupancy from sequence 
(37, 86, 127-130), because STRs composed of 
overlapping instances of even low-affinity sites 
bearing little resemblance to the known motif 
can substantially alter binding. Binding of the 
same TF to dissimilar motifs has previously 
been reported and attributed to alternate bind- 
ing modes driven by either entropic or en- 
thalpic effects (131-133). Although previous 
reports have identified repeated instances of 
motif and motif-like sequences that bind TFs 
and thereby alter gene expression (20, 134, 135), 
these observations are well explained by sim- 
ple position weight matrix models (136-138) 
that do not predict the enhanced binding to 
STRs observed here. Here, we show that sta- 
tistical mechanical models that explicitly ac- 
count for low-affinity binding substantially 
improve quantitative binding predictions for 
arbitrary DNA sequences relative to motif-based 
approaches. While this effect is particularly 
apparent for STRs, we also expect nonrepeti- 
tive sequence contexts containing many low- 
affinity binding sites to show similar effects. In 
future work, small sets of absolute affinity mea- 
surements across many TFs could be combined 
with statistical mechanical and machine learn- 
ing models to enable quantitative predictions 
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of how changes in nuclear TF concentration 
alter cooperation and competition between TFs 
to drive specific transcriptional programs. 

Because our statistical mechanics framework 
is agnostic to the identity of binding partners 
and considers only a distribution of binding 
energies, we anticipate that the same physical 
considerations by which DNA-binding proteins 
recognize STRs may also apply to RNA-binding 
proteins. Evidence in the literature already 
points to a role for intronic STRs in regulating 
splicing (139-150) or promoting the formation 
of RNP compartments (151-153). These obser- 
vations raise the intriguing possibility that STR- 
enriched enhancers could serve a dual function 
of binding TFs to regulate transcription and 
subsequently recruiting RNA-binding proteins 
once transcribed into enhancer RNAs. 

STRs are highly evolvable (101, 154), requir- 
ing only mispairing during replication, repair, or 
recombination to expand or contract (155-157), 
and may therefore serve as the raw material for 
evolving new cis-regulatory elements (J01, 158) 
and fine-tuning existing regulatory modules for 
sensitive transcriptional programs, such as those 
in development (159). This work may motivate 
future efforts to assess the evolution of reg- 
ulatory networks across species by considering 
not only conservation of nucleotides within 
motifs, but also the types and lengths of STRs 
surrounding them. The evolution of regulatory 
STRs is likely complemented by the coevolu- 
tion of TF binding preferences, consistent with 
a model in which DBDs exist as a conforma- 
tional ensemble of partially folded states in 
which single-residue substitutions alter the dis- 
tribution of states within the ensemble and 
therefore tune the specificity or promiscuity 
of binding (50, 160-163). The observation that 
STR polymorphisms disrupt gene expression 
by directly altering TF binding may provide 
new clinical insights and therapeutic directions 
for a variety of STR-associated diseases, from 
autism (29, 30) to microsatellite instability- 
associated cancers (164, 165) and others yet to 
be discovered. 
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Methods summary 

Microfluidic device fabrication and operation 
Microfluidic devices were fabricated and aligned 
to printed oligonucleotide or plasmid DNA ar- 
rays as described previously (48, 50). Micro- 
fluidic devices were controlled by a custom 
pneumatic manifold (766) and imaged with a 
fully automated microscope and custom soft- 
ware (50, 160). 


MITOMI and k-MITOMI experiments 


Single-stranded DNA oligonucleotide libraries 
were synthesized by Integrated DNA Technol- 
ogies (IDT) and fluorescently labeled and 
duplexed with a primer extension step. eGFP- 
tagged TFs were expressed off-chip with wheat 
germ extract or PURExpress (New England 
Biolabs) and purified with anti-eGFP anti- 
bodies on the device. Printed fluorescent DNA 
was solubilized in TBS or wheat germ extract 
and allowed to bind to immobilized TFs for 
90 min before washing out unbound species and 
imaging. Binding was quantified as the ratio of 
DNA fluorescence to TF fluorescence, and the 
resulting data for multiple concentrations of 
DNA were fit to a Langmuir isotherm to extract 
Ka and AAG values. For kinetic measurements, 
excess unlabeled (“dark”) dsDNA was itera- 
tively introduced in solution, and button 
valves were opened to allow dissociation. Mac- 
roscopic dissociation rates (Kotrapparent) Were 
fit to the ratio of DNA fluorescence to TF fluo- 
rescence over several time points to an expo- 
nential decay. Apparent macroscopic association 
rates (Kon apparent) Were inferred by Kon apparent = 
Koteapparent/Ka, assuming a two-state macro- 
scopic binding model. 


Partition function models of binding 


Partition function-based models of binding 
are based on 8-mer intensities derived from 
previously published uPBM data. uPBM data 
and associated Z scores for all possible 8-mers 
were downloaded from CIS-BP (64) and filtered 
for data quality. We predicted relative bind- 
ing energies for an arbitrary sequence to a 
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given TF by splitting the sequence into over- 
lapping 8-mers and computing the following: 


AAG = cBlog ; es —_ AG, where B = 1/ 
J 


(kT), J; is the PBM intensity for an 8-mer j, 
and c is some calibration constant determined 
by a linear fit between PBM and MITOMI data. 


STAMMP experiments 


Single-stranded DNA oligonucleotides were 
synthesized by IDT and fluorescently labeled 
and duplexed with a primer extension step. 
eGFP-tagged TFs were expressed and purified 
on-chip with PURExpress, and increasing con- 
centrations of fluorescently labeled dsDNA 
were flowed over the chip and allowed to 
bind for 50 min before washing and imaging. 
Binding was quantified as a ratio of DNA fluo- 
rescence to TF fluorescence, and the result- 
ing data for multiple concentrations of DNA 
were fit to a Langmuir isotherm to extract 
Kq and AAG values. 


CTMC and Gillespie models 


Microscopic kinetic parameters were fit to a 
four-state CTMC kinetic model in which a TF 
can be free, nonspecifically bound and testing, 
bound to the motif, or bound to the flanks 
from mean Ko apparent 2nd Ka measurements 
with a custom MATLAB script. Gillespie sim- 
ulations were performed using custom Python 
scripts with microscopic parameters fit from 
the Pho4 CTMC model and 10,000 iterations 
per parameter set with 2600 TFs and 100,000 s 
per simulation. 


Affinity distillation 

ChIP-seq data for MAX were downloaded 
from the ENCODE portal (87, 88) with acces- 
sion numbers ENCSROOOEZM (control) and 
ENCSROOOEZF (experiment). NN architecture 
was adapted from BPNet (37) and trained on 
IDR peaks, with regions from chromosomes 8 
and 9 used as the test set and regions from 
chromosomes 16, 17, and 18 used as the tuning 
set for hyperparameter tuning. All NN models 
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were implemented and trained in Keras (v.2.2.4; 
TensorFlow backend v.1.14) (167, 168) using 
the Adam optimizer (169). AD scores [Alog 
(counts)] were calculated by inserting a given 
sequence at the center of 100 different back- 
ground sequences and computing the mean of 
the differences between the log(count) pre- 
dictions for query sequence and background 
sequence alone, as described in (86). 


Bioinformatic analyses 


STRs in the human genome were identified 
using Tandem Repeats Finder (170). Genome 
annotations used to calculate enrichment of 
STRs in enhancers were downloaded from the 
Enhancer Atlas (100), FANTOM 5 (171), and 
HACER (172) databases. Mutation rates per 
cell division were cited or calculated as pre- 
viously described (154, 173-175). 
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INTRODUCTION: Genome sequencing has re- 
vealed extensive genetic variation in human 
populations. Missense variants are genetic var- 
iants that alter the amino acid sequence of pro- 
teins. Pathogenic missense variants disrupt 
protein function and reduce organismal fitness, 
while benign missense variants have limited effect. 


RATIONALE: Classifying these variants is an 
important ongoing challenge in human genet- 
ics. Of more than 4 million observed missense 


variants, only an estimated 2% have been 
clinically classified as pathogenic or benign, 
while the vast majority of them are of un- 
known clinical significance. This limits the 
diagnosis of rare diseases, as well as the de- 
velopment or application of clinical treatments 
that target the underlying genetic cause. 
Machine learning approaches could close the 
variant interpretation gap by exploiting pat- 
terns in biological data to predict the patho- 


genicity of unannotated variants. Specifically, 
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AlphaMissense pathogenicity prediction. AlphaMissense takes as input a missense variant and predicts its 
pathogenicity. We fine-tuned AlphaFold on human and primate variant population frequency data and calibrated the 
confidence on known disease variants. AlphaMissense predicts the probability of a missense variant being 
pathogenic and classifies it as either likely benign, likely pathogenic, or uncertain. We provide predictions for all 
possible human missense variants as a resource for the community. 
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AlphaFold, which accurately predicts prc Chee 


ie 


structure from protein sequence, may be t..-— 
as a foundation to predict the pathogenicity of 
variants on proteins. 


RESULTS: We developed AlphaMissense to lever- 
age advances on multiple fronts: (i) unsupervised 
protein language modeling to learn amino 
acid distributions conditioned on sequence 
context; (ii) incorporating structural context 
by using an AlphaFold-derived system; and 
(ii) fine-tuning on weak labels from popula- 
tion frequency data, thereby avoiding bias from 
human-curated annotations. AlphaMissense 
achieves state-of-the-art missense pathogenic- 
ity predictions in clinical annotation, de novo 
disease variants, and experimental assay bench- 
marks without explicitly training on such data. 
As a resource to the community, we provide a 
database of predictions for all possible single 
amino acid substitutions in the human pro- 
teome. We classify 32% of all missense variants 
as likely pathogenic and 57% as likely benign 
using a cutoff yielding 90% precision on the 
ClinVar dataset, thereby providing a confident 
prediction for most human missense variants. 

We show how this resource can be used to 
accelerate research in multiple fields. Molecular 
biologists could use the database as a start- 
ing point for designing and interpreting ex- 
periments that probe saturating amino acid 
substitutions across the human proteome. Hu- 
man geneticists could combine gene-level 
AlphaMissense predictions with population 
cohort-based approaches to quantify the func- 
tional significance of genes, especially for shorter 
human genes where cohort-based approaches 
lack statistical power. Finally, clinicians could 
benefit from the boost in coverage of con- 
fidently classified pathogenic variants when 
prioritizing de novo variants for rare disease 
diagnostics, and AlphaMissense predictions 
could inform studies of complex trait genet- 
ics that use annotations of rare, likely delete- 
rious variants. 


CONCLUSION: AlphaMissense predictions may 
illuminate the molecular effects of variants on 
protein function, contribute to the identifica- 
tion of pathogenic missense mutations and pre- 
viously unknown disease-causing genes, and 
increase the diagnostic yield of rare genetic dis- 
eases. AlphaMissense will also foster further 
development of specialized protein variant effect 
predictors from structure prediction models. 
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The vast majority of missense variants observed in the human genome are of unknown clinical significance. We 
present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population 
frequency databases to predict missense variant pathogenicity. By combining structural context and 
evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and 
experimental benchmarks, all without explicitly training on such data. The average pathogenicity score 

of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing 
statistical approaches are underpowered to detect. As a resource to the community, we provide a database 
of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as 


either likely benign or likely pathogenic. 


enome sequencing has revealed exten- 
sive genetic variation in human popula- 
tions (7-3). Missense variants are genetic 
variants that alter the amino acid se- 
quence of proteins. Pathogenic missense 
variants severely disrupt protein function and 
reduce organismal fitness, whereas benign 
missense variants have limited effects. Of the 
more than 4 million observed missense var- 
iants, only an estimated 2% have been clinically 
classified as pathogenic or benign. Classifying 
the remaining variants of unknown signif- 
icance is an important ongoing challenge in 
human genetics (3). Lack of accurate mis- 
sense variant functional predictions limits 
the diagnostic rate of rare diseases, as well as 
the development or application of clinical treat- 
ments that target the underlying genetic 
cause. Although multiplexed assays of variant 
effect (MAVEs) systematically measure pro- 
tein variant effects (4) and can accurately 
predict the clinical outcomes of variants (5), a 
proteome-wide survey of variant pathogenic- 
ity remains incomplete because of the cost and 
labor required for MAVE experiments (6). 
Machine learning approaches could close 
this variant interpretation gap by exploiting 
patterns in biological data to predict the path- 
ogenicity of unannotated variants. Machine 
learning methods follow four broad strategies. 
The first class of methods train directly on 
human-curated variant databases (7-10), there- 
by leveraging prior knowledge to inform the 
status of unannotated variants. Such strategies 
will inherit biases from the human curators 
and previous in silico predictors, and they are 


Google DeepMind, London, UK. 

*Corresponding author. Email: jucheng@google.com (J.C.); 
pushmeet@google.com (P.K.); avsec@google.com (Z.A.) 
tThese authors contributed equally to this work. 


Cheng et al., Science 381, eadg7492 (2023) 


prone to leaking data between training and 
test splits (1D). 

To overcome such circularity, the second 
class of methods train with weak labels that do 
not depend on human classification (12, 13). In 
the training data, “benign” variants are de- 
fined as variants frequently observed in human 
or other primate species. The “pathogenic” 
class is approximated with hypothetical var- 
iants unobserved in the human population. 
Such an approach represents a promising di- 
rection to mitigate potential human curation 
biases. However, because the training data 
contain many false labels, such models re- 
quire evaluation on more-reliable labels to as- 
sess their true performance. 

A third class of methods avoid training on 
variant annotations directly and instead use 
unsupervised approaches to model the dis- 
tribution of amino acids at a given sequence 
position conditioned on an amino acid se- 
quence context (/4-16). Recently, deep learning 
models that learn high-order dependencies 
between amino acids from protein sequences, 
such as autoencoders or language models, have 
achieved strong performance (17-19). In such 
models, pathogenicity is interpreted as the 
difference in predicted log-likelihood between 
reference and alternate sequences. Although 
such models effectively capture the distribu- 
tion of naturally evolved sequences, they lack 
the state-of-the-art understanding of protein 
structure achieved by AlphaFold (AF) (20, 27). 

A fourth strategy is to exploit protein struc- 
ture to reason about pathogenicity, as the 
structural context of an altered amino acid 
provides crucial information to interpret its 
effects on the protein. Initial explorations with 
predicted protein structures showed promise 
(22, 23), and estimates of genetic evolutionary 
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constraint have been aided by predicted pro- 
tein structures (24). Although this strategy has 
improved genetic constraint quantification, 
using this approach for pathogenicity predic- 
tion directly has shown only moderate per- 
formance on ClinVar variants (24), likely because 
of low genetic diversity observed in current 
human sequence databases. 

AF has recently shown that highly accurate 
protein structures can be predicted at scale 
using protein sequences as input (27, 25). Such 
protein structure models may act as founda- 
tions for understanding other aspects of pro- 
tein biology, such as variant pathogenicity. 
Although AF is largely insensitive to input se- 
quence variation and cannot accurately predict 
structural changes upon point mutation (26), 
we hypothesized that AF’s intrinsic under- 
standing of multiple sequence alignments 
(MSAs) and protein structure provides a val- 
uable starting point for models directly pre- 
dicting the pathogenicity of missense variants. 

Here, we present AlphaMissense, which 
combines the following elements of existing 
strategies: (i) training on weak labels from 
population frequency data, avoiding circular- 
ity by not using human annotations; (ii) incor- 
porating an unsupervised protein language 
modeling task to learn amino acid distribu- 
tions conditioned on sequence context; and 
Gii) incorporating structural context by using 
an AF-derived system. We achieve state-of- 
the-art predictions in clinical annotation, 
de novo disease variants, and experimental 
MAVE benchmarks, without explicitly train- 
ing our model on such data. We predict and 
characterize the pathogenicity of all single 
amino acid substitutions in the human pro- 
teome and make these predictions available to 
the community. 


AlphaMissense: Fine-tuning AlphaFold for 
variant effect prediction 


AlphaMissense takes as input an amino acid 
sequence and predicts the pathogenicity of all 
possible single amino acid changes at a given 
position in the sequence. AlphaMissense lever- 
ages two key capabilities of AF: its highly 
accurate model of protein structure and its 
capacity to learn evolutionary constraints from 
related sequences (27). Accordingly, the imple- 
mentation of AlphaMissense closely follows 
that of AF, with minor architectural differences 
(Fig. 1 and fig. $1; and see methods in the sup- 
plementary materials). Notably, AlphaMissense 
does not predict the structural changes of the 
mutated amino acid sequences but instead 
predicts pathogenicity as scalar values. 
AlphaMissense is trained in two stages. In 
the first stage, the network is trained like AF 
to perform single-chain structure prediction 
(AF pretraining) along with protein language 
modeling by predicting the identity of the 
amino acids masked at random positions in 
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Fig. 1. Overview of AlphaMissense. (A) AlphaMissense architecture. The model 
inputs consist of the reference protein sequence [cropped to length (L) = 

the training set for the same 

e sequence alignments (MSAs, up 
e variant at a time (N = 1). The 
row of the MSA with all sampled 
n AlphaFold, the model constructs 
the pair representation (i.e., encodes information about two-way interactions 
between residues) from the reference sequence (embedding size K,air), and 
SA (embedding size Kmsa). The 
MSA and pair representations are processed by a stack of Evoformer layers 
with recycling. Finally, the model predicts the structure of the reference 
sequence and the pathogenicity score (s?) for the variant, which is derived 


256 residues], a set of variants sampled from 
sequence (up to N = 50 variants), and multip 
to Na = 2048). Inference is performed for on 
reference sequence is repeated in the second 
variant positions masked (see methods). As i 


the MSA representation from the masked 


the MSA. We introduced a few minor archi- 
tecture modifications to AF and increased the 
loss weight toward the protein language mod- 
eling while still achieving structure prediction 
performance comparable to that of AF (see 
methods). After pretraining, the masked lan- 
guage modeling head can already be used for 
variant effect prediction by computing the log- 
likelihood ratio between the reference and al- 
ternative amino acid probabilities, as done in 
MSA Transformer (27) and Evolutionary Scale 
Modeling [ESM (28)]. 

In the second stage (Fig. 1A), the model is 
fine-tuned on human proteins with an ad- 
ditional variant pathogenicity classification 
objective defined for a variant sequence pre- 
sented in the second row of the MSA (Fig. 1A). 
For the training set, we assign benign labels 
to variants frequently observed in the human 
and primate populations, and pathogenic la- 
bels to variants absent from human and pri- 
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variants (ClinVar) 


mate populations, as is done in PrimateAI (72) 
(Fig. 1B; see methods). We stop training the 
model once it starts to overfit on the valida- 
tion set (2526 ClinVar variants with an equal 
number of pathogenic and benign variants per 
gene; see methods). 

Our training set is inherently noisy, because 
many unobserved variants are potentially benign, 
but it offers enough learning signal to improve 
the variant pathogenicity score compared 
with pretraining alone. To increase the qual- 
ity and size of the training set, we employ self- 
distillation by using preliminary AlphaMissense 
models to filter out unobserved variants pre- 
dicted to be likely benign. The fine-tuning stage 
is then repeated using this filtered training set 
(see methods). Further innovations include a 
custom classification loss function, sampling 
multiple variants during training, improving 
the matched sampling of variants, and weight 
decay during fine-tuning toward the pretrained 
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rare disease 


(ProteinGym) 


from the masked residue prediction head as the log-likelihood difference 
between residue a relative to the reference residue at position i (see 
(B) The pathogenicity score is fine-tuned as a binary classification 
as benign (observed or frequent missense variants in human or primate 
populations) or pathogenic (unobserved human missense variants). 
split the benign variants into clusters by their minor allele frequency (MAF) 
and introduce weights in the loss function that reduce the contribution of 
rare variants. For each observed variant in the benign set, we sample a 
missense variant from the pathogenic set and assign it the same loss weight 
as for the benign variant (see methods). (C) We evaluated AlphaM 
on a diverse set of benchmark datasets, including annotated missense 
variants in ClinVar (30), de novo disease variants (54), and MAVE data 
collected in ProteinGym (19). 


methods). 
of variants 


We 


issense 


parameter values (see methods and the ablation 
studies section below). 


Improved pathogenicity classification across 
multiple clinical benchmarks 


Clinical databases collect missense variants 
that cause human disease. These databases 
can be used to benchmark pathogenicity pre- 
diction models, but such data contain human 
biases and may misrepresent the true dis- 
tribution of clinically relevant variants (see 
supplementary note in the supplementary 
materials). Models trained on these databases 
(ClinVar, for example) inherit these biases and 
often fail to generalize to other benchmarks 
(11, 29). We avoid training directly on clinically 
curated labels to mitigate such issues and en- 
able faithful evaluation on diverse benchmarks, 
including the held-out test set of annotated 
missense variants in ClinVar (30), de novo var- 
iants from patients with rare developmental 
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disorders and controls (72), MAVE bench- 
marks in ProteinGym (19), and additional 
MAVE benchmarks curated in this study (Fig. 
1C; see methods). 

We first evaluated our model on ClinVar 
missense variants. After balancing the number 
of pathogenic and benign variants per gene, 
AlphaMissense achieves an area under the 
receiver operator curve (auROC) of 0.940 on 
18,924 ClinVar test variants, compared with 
an auROC of 0.911 achieved by the Evolution- 
ary model of Variant Effect (EVE; P = 0.001, 
bootstrap), the next best model that did 
not train directly on ClinVar (17) (Fig. 2A). 
AlphaMissense also outperforms models trained 
directly on ClinVar, despite these models ex- 
hibiting data leakage and label circularity (Fig. 
2A; see supplementary note) (11, 17, 29). Further- 
more, we observe that AlphaMissense is cap- 
able of distinguishing pathogenic from benign 
ClinVar variants within regions of high evolu- 
tionary constraint (37), and it outperforms the 
best competing models on this task (ESM1b, 
P = 0.001, bootstrap) (fig. S2A). This result 
suggests that the model is not merely relying 
on identifying constrained domains but is 
capturing differences in the effect of individ- 
ual variants within those domains. Our model 
performance is consistent across different 
AlphaFold confidence levels (fig. S2B). How- 
ever, we note reduced performance on var- 
iants from residues predicted to be disordered 
(fig. S2C). 

Clinical assessment of variants often focuses 
on specific disease-associated genes, and dis- 
criminating between benign and pathogenic 
variants within such genes is an important, 
clinically relevant task for predictive models. 
To understand AlphaMissense model perfor- 
mance on this task, we analyzed the 612 genes 
with at least five pathogenic and five benign 
variants in the ClinVar test set. For these 
genes, we calculated the gene-level auROC, 
which captures the model’s performance at 
classifying variants within an individual gene. 
When evaluated in this way, AlphaMissense 
outperforms the next best method that did not 
train directly on ClinVar, EVE (17), with aver- 
age gene-level auROC of 0.950 versus 0.921 
(P = 0.001, bootstrap) (Fig. 2B). 

We further assessed the performance of 
AlphaMissense on two important sets of pro- 
teins. The first set comprises proteins encoded 
by the clinically actionable genes prioritized 
by the American College of Medical Genetics 
(ACMG) (32), which has recommended that 
clinical exome and genome sequencing of 
these genes be returned as secondary findings 
in the clinic because of their clear disease 
phenotypes and highly penetrant mutations. 
For the 34 ACMG genes with sufficient ClinVar 
labels and scores from both methods, 26 genes 
(77%) see improvements using AlphaMissense 
pathogenicity predictions over EVE (fig. S3A). 
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The second set are proteins prioritized for 
future MAVE studies by the community on the 
basis of clinical relevance and experimental 
tractability (33). For the 20 genes with suffi- 
cient ClinVar labels and scores from both 
methods, improvements were seen relative to 
EVE for 16 genes (80%) using AlphaMissense 
pathogenicity predictions (fig. S3B). 

Finally, we evaluated AlphaMissense on the 
Deciphering Developmental Disorders (DDD) 
benchmark, where AlphaMissense achieves an 
auROC of 0.809, on par with PrimateAI (auROC = 
0.797, P = 0.31, bootstrap) (12) (Fig. 2C). We 
also evaluated our model on classifying cancer 
hotspots, where AlphaMissense achieves an 
auROC of 0.907 compared with 0.885 for the 
next-best model, VARITY (P = 0.001, boot- 
strap) (9) (fig. S2D). Overall, AlphaMissense 
achieves state-of-the-art performance across 
all curated clinical benchmarks, whereas no 
other previously reported model consistent- 
ly ranks highly across these benchmarks. 


Calibrated AlphaMissense predictions expand 
the number of confidently classified variants 
relative to other methods 


Having established state-of-the-art performance 
of AlphaMissense on clinical benchmarks, we 
next generated and analyzed proteome-wide 
predictions. We used AlphaMissense to pre- 
dict the pathogenicity of all 216 million pos- 
sible single amino acid changes across the 
19,233 canonical human proteins, resulting in 
71 million missense variant predictions satu- 
rating the human proteome (see methods). 
Practical use of predicted scores requires 
careful calibration against the gold-standard 
set of clinically curated pathogenic and benign 
variants. We used the balanced validation set 
with 2526 variants from ClinVar (see methods) 
to calibrate our predictions using a univar- 
iate logistic regression model. This approach 
yields calibrated scores, as shown on the 
ClinVar test set (Fig. 2D; see methods). Cali- 
brated AlphaMissense scores (ranging between 
0 and 1) can be interpreted as the approximate 
probability of a variant being clinically patho- 
genic. We note that as the majority of predictions 
are close to 0 or 1, the calibrations for scores 
between 0.2 and 0.8 are likely less accurate. 
Next, we used our calibrated prediction 
scores to classify variants into three discrete 
categories similar to ACMG terminology (32, 34): 
likely pathogenic, likely benign, and ambiguous 
[cutoffs were chosen such that variants classi- 
fied as likely pathogenic or likely benign have 
90% expected precision estimated from ClinVar 
for both classes, as done in (17)] (fig. S4A). Owing 
to higher predictive performance, the fraction 
of ClinVar test variants that we can confidently 
classify with 90% precision is increased by 25.8 
percentage points (from 67.1% to 92.9%) com- 
pared with the recent well-performing unsuper- 
vised model EVE. (77) (Fig. 2E and fig. S4B). This 
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approach offers a major expansion in the num- 
ber of variants with confident predictions in a 
proteome-wide context. 


Overall properties and examples of 
AlphaMissense predictions 


To understand the overall properties of the 
predictions, we compared them against the 
effective number of sequence alignments (Nese 
score), genetic constraint, and predicted pro- 
tein disorder (fig. S4, C to F). Residues with a 
low effective number of aligned sequences and 
hence lower conservation levels tend to have 
lower predicted pathogenicity (fig. S4C). This 
relationship is less pronounced when looking 
at aggregated protein-level results (fig. S4D), 
suggesting that AlphaMissense captures do- 
main conservation within a protein, rather than 
overall protein-level evolutionary conservation. 
Similarly, variants located in evolutionarily _ 
constrained genes are systematically predicted 
as more pathogenic compared with those in 
unconstrained genes (fig. S4E). Variants lo- 
cated in structured regions, which may alter 
protein stability (35, 36), are associated with 
higher pathogenicity scores than variants lo- 
cated in disordered regions (fig. S4F; protein 
disorder is predicted with AlphaFold). This is 
consistent with recent observations that known 
disease-causing variants are more likely to re- 
side in thermally stable proteins (37). 

To further understand properties of amino 
acid substitutions learned by AlphaMissense, 
we computed the mean predicted pathogenic- 
ity per amino acid substitution across all 
human proteins (fig. S4G). As expected, muta- 
tions in aromatic amino acids or cysteine are 
more likely to be pathogenic given their role in 
maintaining protein structure. The predicted 
substitution scores are asymmetric, as previ- 
ously reported (38), and correlate with the 
BLOSUM62 (39) substitution matrix overall 
[correlation coefficient (7) = —0.61; fig. S4H] 
and per reference amino acid (fig. S41). To- 
gether, these results suggest that the model is 
using both the structural information and evo- 
lutionary information present in the MSA to 
make predictions consistent with known biology. 

We visualized pathogenicity predictions along- 
side ClinVar labels (Fig. 2, F and G, left panels) 
and AF predicted protein structures (Fig. 2, 
F and G, right panels). General trends can be 
observed for these specific proteins. For in- 
stance, structurally disordered regions are aligned 
with benign predictions and benign clinical 
annotations, consistent with the proteome- 
wide results (fig. S4F). In particular cases, the 
pathogenicity predictions make sense in light 
of the protein function. For example, we pre- 
dict the transmembrane domain of ACVRL1 
(amino acids 119 to 141) to be more tolerant to 
mutation than either of the globular domains, 
which represent enzymatic or protein-protein 
interaction sites (Fig. 2F). 


3 of 11 


RESEARCH | RESEARCH ARTICLE 


A B 


ClinVar (Class-balanced 18924 variants) 


AlphaMissense AlphaMissense 


VARITY_R_LOO REVEL 
REVEL VARITY_R_LOO 
EVE gMvP 
gMvP EVE 
Eigen Eigen 
CADD CADD 
Polyphen2_HVAR ESM1b 
ESM1b Polyphen2_HVAR 
SIFT ESM1v 
Polyphen2_HDIV SIFT 
ESM1v Polyphen2_HDIV 
PrimateAl PrimateAl 
05 #06 O7 O08 O09 1 0.5 06 07 
auROC 
Trained on ClinVar 
1.0 
= 
2 08 15000 
S £ 
oO c 
= 06 a 
xe) 10000 @ 
ic > 
o : 
> 0.4 ¢ 
= 5000 > 
@ 0.2 
ja 
0.0 T T T T T° 
0.0 0.2 0.4 0.6 0.8 1.0 
AlphaMissense score 
F acmc genes 
ACVRL1/P37023 (auROC=0.965) transmembrane 
1 i ) 
2 B&G a ena be 2 domain 39" 
E 0.75 ene ee 
vo 
2 ° 
i Sigrid e ia he 
& 0,25 Psy . 7° 
= 5 ee guee ° 27°4a@ 
0 100 200 «300 400 + 500 
Position 


RYR1/P21817 (auROC=0.954) 
1 
Se Sk Pt ee Lr | ~ 
@ oe e 


iS 
N 
a 


AM Pathogenicity 


T r 
500 1000 


Position 


DSP/P15924 (auROC=0.97) 


|: YAPNORE | a8 Teese 


S 
N 
a 


So 
N 
a 


AM Pathogenicity 


fe $83 
— 


- 
500 1000 
Position 


Fig. 2. Performance of AlphaMissense on clinically curated classification 
benchmarks. Benchmarks are evaluated by area under the receiver operator 
curve (auROC). Error bars show the 95% confidence interval of 1000 bootstrap 
resamples (see methods). A few manually chosen methods are colored to 
illustrate the relative position on different benchmarks. (A) Performance 

on classification of ClinVar variants (9462 pathogenic and 9462 benign variants 
from 999 proteins) balancing the number of positive and negative variants 

per gene. Methods shown in gray were trained directly on ClinVar. Some of their 
training variants are contained in this test set, so their performances are 

likely overestimated. Error bars show the 95% confidence interval of 1000 
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bootstrap resamples (see methods). (B) Average per-gene auROC on the ClinVar 
test set. A total of 612 proteins with at least five benign and five pathogenic 
ClinVar test variants are considered. (C) Comparison of AlphaMissense and 
other predictors on distinguishing de novo variants from DDD cohort patients 
and healthy controls (12). A total of 353 patient variants and 57 control variants 
from 215 DDD related genes are considered. We excluded EVE because of its 
low coverage of variants in this dataset (227/410 variants). (D) The 
AlphaMissense scores were calibrated on the class-balanced ClinVar validation 
set (see methods). The figure shows the calibration curve, which plots the 
average score against the fraction of pathogenic variants per bin, computed on 
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the ClinVar test set (82,872 variants). The error bars represent 95% confidence 
intervals computed from 1000 bootstrap resamples. The histograms show the 
distribution of scores among pathogenic (red) and benign (blue) variants. 

(E) Fraction of resolved (unambiguous) missense variants at different levels of 
target precision. Precision is defined as the fraction of true predictions in 

both pathogenic and benign class prediction. The resolved fractions are 
computed with ClinVar test set variants from proteins scored by EVE (dark lines, 
all) and then filtered to proteins with at least 3, 5, or 10 variants for each 

label (lighter lines). (F) Example proteins chosen from ACMG clinically actionable 
genes (32). Protein names are written as “[HUGO symbol]/[Uniprot accession 
ID].” (Left panels) Missense variants, represented as points, are plotted against 


AlphaMissense achieves state-of-the-art 
agreement with multiplexed assays of variant effect 
MAVE experiments generate “proactive” maps 
of variant effects (40) by expressing protein 
variants in cells and measuring activity using 
growth or fluorescence readouts. Because MAVE 
experiments densely cover (and often saturate) 
the protein of interest, they provide valuable in- 
formation on protein regions otherwise missed 
by sparse clinical curations, although the direct 
clinical utility of MAVE data depends on the 
assay readout and experimental quality (41). 

To assess the agreement between Alpha- 
Missense and MAVE studies, we benchmarked 
predictions against two sources of MAVE data: 
15 million variants from 72 proteins collected 
in ProteinGym (19) and an additional bench- 
mark set consisting of 20 recently published 
human proteins not contained in ProteinGym 
(see methods). Relative to other methods, 
AlphaMissense agrees with MAVE data the 
most strongly (mean Spearman correlation on 
ProteinGym: 0.514; on the additional MAVE 
benchmark: 0.450; Fig. 3, A to C). When re- 
stricting to only those amino acid variants 
from 25 human proteins that are scored by all 
methods, AlphaMissense remains the high- 
est scoring in ProteinGym out of the 13 meth- 
ods (mean Spearman correlation: 0.4:74; Fig. 3B). 
AlphaMissense improves predictions for most 
proteins within both benchmarks compared 
with the next-best model [62/72 relative to 
the Global Epistatic Model for Predicting Mu- 
tational Effects (GEMME) (16) in ProteinGym, 
60/72 relative to EVE in ProteinGym, and 13/20 
relative to ESM1v in the additional MAVE 
benchmark] (fig. S5, A and B). 

We compared the observed MAVE data and 
available model predictions against the exper- 
imentally resolved protein structures and do- 
main annotations for disease-relevant proteins. 
The SHOC2 protein forms a complex with 
MRAS and PPIC to activate the Ras-MAPK 
(mitogen-activated protein kinase) signaling 
pathway in cancer (42). AlphaMissense path- 
ogenicity correlates with MAVE data that 
measure the impact of SHOC2 variants on 
Ras-activated cancer cell fitness (43) (Spearman 
correlation: 0.47), outperforming ESM1v, ESM1b, 
and EVE (Spearman correlation: 0.41, 0.40, and 
0.32, respectively; fig. S5B). 
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See also fig. S3B. 


We investigated whether AlphaMissense 
better captures pathogenicity driven by spe- 
cific domains within SHOC2, which would be 
reflected by the average pathogenicity at each 
amino acid position. AlphaMissense per- 
position average pathogenicity agrees strongly 
with the MAVE per-position average (positional 
Spearman correlation: 0.64), outperforming 
ESMib, ESM1v, and EVE (positional Spearman 
correlation: 0.56, 0.55, and 0.48, respectively; 
fig. S5C). Of the first 80 amino acids of SHOC2, 
positions 63 to 74 were pathogenic according 
to the MAVE assay (Fig. 3D). This region was 
structurally shown to bind PPIC through an 
RVxF motif (43) (Fig. 3E). AlphaMissense is 
the only model to correctly predict pathogenic 
effects of mutations in this functionally im- 
portant region (Fig. 3D and fig. S5D). Ad- 
ditionally, after the 80th position, both the 
MAVE data and AlphaMissense predictions 
peak in pathogenicity approximately every 
23 amino acids (Fig. 3D), corresponding to 
the 20 leucine-rich repeat domains that con- 
tact MRAS and PPIC approximately every 
23 amino acids (Fig. 3, D and E). Overall, res- 
idues directly contacting either MRAS or 
PPIC score as highly pathogenic (median 
AlphaMissense pathogenicity: 0.98 and 0.96, 
respectively), nearly as highly as core hydro- 
phobic residues (median AlphaMissense path- 
ogenicity: 0.99) and higher than surface residues 
that do not form protein-protein contacts 
(median AlphaMissense pathogenicity: 0.51; 
fig. S5E). 

Next, we sought to determine whether the 
average substitution effect of each of the 20 
possible amino acids, driven by their chem- 
ical properties, is better reflected in our mod- 
el. For SHOC2, AlphaMissense agrees most 
strongly with the measured per-amino acid 
average substitution effect compared with other 
models (fig. S5C). Overall, when calculated 
this way across all proteins in ProteinGym and 
the additional MAVE benchmark, AlphaMissense 
displays the highest average performance 
across both the amino acid substitution and 
the positional metrics, suggesting that im- 
provements in domain-level pathogenicity 
prediction and amino acid properties both 
underlie model performance (mean positional 
Spearman correlation on ProteinGym: 0.54; 
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AlphaMissense (AM) pathogenicity scores (y axis) and amino acid positions 

(x axis). Variants predicted as likely pathogenic are shown in red, variants 
predicted as likely benign are shown in blue, and ambiguous variants are shown 
in gray. If a variant contains a clinical label in ClinVar, it is plotted as a solid 
circle. For proteins longer than 1400 amino acids, the first 1400 are shown. (Right 
panels) The protein structure prediction from AlphaFold is shown for the selected 
region. Each residue in the predicted structure is colored according to the average 
AlphaMissense pathogenicity score of that residue (out of 19 possible amino 
acid changes per residue). See also fig. S3A. (G) The same as (F), but for examples 
chosen from genes prioritized by the MAVE community for further study (33). 


mean substitution Spearman correlation on 
ProteinGym: 0.545; fig. S5F). 

Another example protein is the human glu- 
cose sensor GCK. Variants that decrease GCK 
activity can cause maturity-onset diabetes of 
the young (MODY) (44). AlphaMissense path- 
ogenicity correlates with MAVE data mea- | 
suring the fitness of auxotrophic yeast strains 
expressing human GCK variants in the pres- 
ence of glucose (45) (Spearman correlation: 
0.53), outperforming ESM1v, EVE, and ESM1b 
(Spearman correlation: 0.49, 0.48, and 0.45, 
respectively; fig. S5B). GCK primarily func- 
tions to catalyze glucose; the catalytic residue 
Asp”? (D205) is the highest-ranked residue by 
average AlphaMissense pathogenicity (0.999), 
and other residues in direct contact with the 
ligand were similarly pathogenic (Fig. 3F). 
AlphaMissense pathogenicity is associated with 
decreased fasting glucose in patients harboring 
missense variants in GCK (Spearman correla- 
tion: —0.49) (45). AlphaMissense pathogenicity 
exhibits a log-linear relationship with in vitro 
GCK activity measurements for 36 clinical var- 
iants (Spearman correlation: —0.65; Fig. 3G), 
falling short of experimental accuracy, as es- 
timated by the MAVE data (Spearman corre- 
lation: 0.75), but closer than other prediction 
methods (Spearman correlation for ESM1v: 
0.61; ESM1b: 0.50; EVE: -0.50; fig. S5G). Highly 
pathogenic variants according to AlphaMissense 
exhibit orders of magnitude lower GCK activ- - 
ity, consistent with the fact that most of the 
clinically confirmed pathogenic GCK variants 
are MODY variants with decreased activity. On 
the other hand, a small number of hyperactive 
pathogenic variants [clustered near the allos- 
teric site, e.g., Thr°’ Tle (T651)] (Fig. 3F and 
fig. S5H) can cause hyperinsulinemic hypo- 
glycemia (44). AlphaMissense more often classi- 
fies these as ambiguous or benign (Fig. 3G). 


Ablating components of AlphaMissense 
reveals key drivers of performance 


Given the improved performance of Alpha- 
Missense on different benchmarks, we next 
investigated which components are neces- 
sary for its high performance on ClinVar and 
ProteinGym test sets by systematically removing 
components of the model in an ablation study. 
We focused on three types of components: 
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Fig. 3. AlphaMissense achieves state-of-the-art agreement with multi- 
plexed assays of variant effect. (A) Performance on MAVE benchmarks. 
ProteinGym (19) is a collection of 72 proteins with MAVE data. The distribution of 
per-protein Spearman correlations between predictions and ProteinGym MAVE 
data for each model is shown, with mean value shown as a dot (and numerically 
above the violin plot). (B) Performance comparison on a subset of ProteinGym 
variants (608,175 variants, 25 human proteins) that were scored by all methods. 
Dots represent mean Spearman correlation across proteins per method, which 
are also represented numerically above each violin plot. (©) We curated an 
additional benchmark dataset of 20 human proteins not included in ProteinGym. 
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The distribution of per-protein Spearman correlations between predictions and 
additional MAVE data is shown. (D) Heatmaps of observed and predicted effects 
of amino acid substitutions on the first 200 amino acids of SHOC2. (Top 
heatmap) Observed pathogenicity as measured by a MAVE assay of cell growth 
in cancer cells dependent on SHOC2 (43). Scores are percentile normalized 
measurements from the experimental assay. Variants with scores closer to zero 
(blue) retain SHOC2 function, whereas scores closer to one (red) lose SHOC2 
function. (Middle and bottom heatmaps) AlphaMissense (AM) and EVE 
pathogenicity scores, respectively. Both scores range from zero to one, with 
higher scores corresponding to increased pathogenicity. Variants with no 
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prediction are colored gray (see EVE heatmap). Domain-level annotations 
(Annot.), including RVxF and leucine-rich repeat (LRR) regions, are shown above 
the heatmaps. Residue-level annotations are also shown [as calculated in 

(43) from Protein Data Bank (PDB) ID 7UPI], representing surface, core, and 
protein-protein interaction residues. (E) Experimentally derived structure of 
SHOC2 (blue and red) in complex with MRAS (yellow) and PPI1C (gold) [PDB 
ID 7UPI (43)]. The mean AlphaMissense pathogenicity score per position is 
shown in the SHOC2 structure, with blue corresponding to benign and red 
corresponding to pathogenic. (Insets) Close-ups of the RVxF binding region of 
SHOC2 contacting PP1C, and the LRR region contacting MRAS and PPIC. 

(F) Experimentally derived structure of GCK (blue and red) [PDB ID 3F9M (55)]. 
The mean AlphaMissense pathogenicity score per position is shown in the 


pathogenic. The active site ligand (yellow) and an allosteric inhibitor (green) 
are also shown. (Insets) Close-ups of residues that contact the ligand (such as 
D205, the catalytic site) and the residues that bind the allosteric inhibitor 
(such as T65l). (G) Comparison of relative activity index for glucokinase mutants 
(56) against AlphaMissense pathogenicity. On the log x axis, a score of 

one indicates in vitro activity equivalent to wild type, a score lower than one 
indicates less activity, and a score above one indicates hyperactivity. Spearman 
correlation is shown in the lower left of the panel. Each dot represents a different 
protein variant, colored according to AlphaMissense classification thresholds. 
The shape indicates the clinical label (45). The dashed line shows the linear fit 
between in vitro measurement and AlphaMissense pathogenicity. T65!, which 
causes hyperinsulinemic hypoglycemia (HH), is labeled. MODY, maturity-onset 


GCK structure, with blue corresponding to benign and red corresponding to 


structure prediction, variant sampling, and 
training data. 

We found that both AF pretraining and 
fine-tuning stages are essential for good per- 
formance (“No AF pretraining” and “No 
fine-tuning on missense variants” in fig. S6). 
Furthermore, we found that pretraining with 
masked MSA alone without structure predic- 
tion is not sufficient for good performance 
(“No structure loss during AF pretraining”), 
suggesting that both structure prediction and 
the protein language modeling across a large 
corpus of samples contributed to the overall 
performance (fig. S6). Sampling variants to 
account for gene bias in the training set and 
sampling multiple variants with the training 
sequence crop are both important to reduce 
gene-level bias and regularize the model (fig. 
S6). Variant self-distillation helped on the 
ProteinGym task but not on the ClinVar 
task. Similarly, we found that additional train- 
ing variants from primates or the extremely 
low minor allele frequency (MAF) variants 
from humans are only mildly helpful on the 
ProteinGym task and not on ClinVar (fig. S6). 
Overall, these results emphasize the impor- 
tance of both training stages: pretraining on a 
large database of structures and fine-tuning 
directly for the target application. 


Gene-level AlphaMissense pathogenicity 
predicts cell essentiality 


An important endeavor in human genetics is 
quantifying the functional significance of a 
protein in human survival or fitness over evo- 
lutionary time. A common approach is to 
measure, among a healthy population cohort, 
depletion in the observed number of variants 
that likely ablate, or severely disrupt, the func- 
tion of a protein compared with the expecta- 
tion under neutral selection (J, 3, 31). However, 
the reliability of such estimates depends on 
the expected number of such variants in a 
gene, which in turn depends on the coding 
sequence length (3). As noted by the authors 
of one such approach, LOEUF (loss-of-function 
observed/expected upper bound fraction) (3), 
many genes are too small for the metric to be 
a reliable measure in current sample sizes (22% 
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diabetes of the young. 


of protein coding genes; Fig. 4A). Given the 
observation that the average AlphaMissense 
pathogenicity of all possible missense var- 
iants within a gene is correlated with LOEUF 
(Spearman correlation: -0.48, P < 2.2 x 107"; 
fig. S4E), we investigated whether AlphaMis- 
sense is capable of predicting genes known to 
be sensitive to functionality-altering pertur- 
bations in humans, particularly among ~4000 
genes that would otherwise be underpowered 
in population cohort-based approaches. 
Overall, we find that a gene’s average 
AlphaMissense pathogenicity shares similar 
properties with LOEUF across a broad range 
of biological measures of intolerance to per- 
turbation in humans, such as depletion in 
observed large structural deletions, and an en- 
richment of genes known to cause severe de- 
velopmental disorders among more-pathogenic 
genes (fig. S7; see supplementary note). Further- 
more, most of the properties of genes in the most- 
pathogenic decile of AlphaMissense predictions 
remain consistent among genes underpowered 
for LOEUF, supporting the generalizability of 
the scores to an additional 4252 small genes 
(fig. S7; see supplementary note). Genes exper- 
imentally identified as essential to cell sur- 
vival across a variety of human cell lines (46) 
showed a strong enrichment among the most- 
pathogenic decile of AlphaMissense. The en- 
richment is both stronger than LOEUF (3.8-fold 
versus 2.3-fold enrichment) for the most- 
pathogenic decile and remains significant 
among smaller genes [5.9-fold, hypergeometric 
P value (Phyper) = 5.6 x 10“; Fig. 4C]. Alpha- 
Missense outperforms LOEUF and PhyloP, a 
conservation-based measure (47), at distinguish- 
ing experimentally determined cell-essential 
from nonessential genes (48) in the context of 
smaller genes (Fig. 4B). AlphaMissense achieves 
an auROC of 0.88 versus 0.81 for LOEUF (P = 
0.001, bootstrap), while maintaining perfor- 
mance among the rest of the proteome (auROC 
of 0.80 versus 0.82, P = 0.092). The advan- 
tage of AlphaMissense in this context is ex- 
emplified by the spliceosome protein complex 
SF3b, which involves a gene with one of the 
highest average AlphaMissense pathogenicity 
scores, PHF5A/SF3B7 (Fig. 4, D to F and data 
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S1). All seven primary protein components (49) 
are experimentally classified as cell-essential 
(48). Four of these are sufficiently large genes 
for LOEUF to reliably predict their functional 
importance, and they are all strongly depleted _ 
for observed predicted loss of function (pLoF) 
variants (in the lowest decile of LOEUF) (Fig. 
4, E and F). The other three are small genes 
(maximum: 125 amino acids), such that LOEUF 
is too underpowered to be informative. Alpha- 
Missense predicts all three of these subunits 
to be more pathogenic than 96% of all human 
protein coding genes (Fig. 4F). 

Together, these observations indicate that 
methodology that combines both AlphaMis- 
sense predictions and population cohort-based 
approaches could be beneficial for quantify- 
ing functional significance, especially for the 
large subset of short human genes where 
population cohort-based approaches lack sta- 
tistical power. 


AlphaMissense predictions as a 
community resource 


We have released four resources for the research 
community. The first is a dataset of 71 million 
missense variant predictions saturating the 
human proteome. Each missense variant is 
defined by the single nucleotide change result- 
ing in a changed amino acid (Fig. 5A). Out of - 
the 71 million missense variants, 32% (22.8 mil- 
lion) are classified as likely pathogenic and 
57% (40.9 million) as likely benign, using score 
cutoffs achieving 90% precision on the ClinVar 
dataset (fig. S4A). We note that choice of the 
cutoff can be adjusted by users to better match 
different use cases or accuracy trade-offs, or to 
achieve the desired precision on a different la- 
beled dataset. The second resource is gene-level 
AlphaMissense pathogenicity predictions, de- 
fined as the average pathogenicity over all pos- 
sible missense variants in a gene. The third is 
the expanded dataset of all 216 million possible 
single amino acid substitutions across the 
19,233 canonical human proteins. Finally, we 
provide predictions for all possible missense 
variants and amino acid substitutions across 
60,000 alternative transcript isoforms for fu- 
ture research and evaluation of isoform-specific 
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Fig. 4. AlphaMissense predicts cell essentiality without constraints on 
sequence length. (A) Distribution of the expected number of pLoF variants 
per gene under neutral selection within a cohort of 125,000 individuals, as 


estimated in (3). Briefly, pLoF va 
introduce a major change to the 
stop codon, which likely results i 


iants are a class of genetic variants that 
protein coding sequence, such as a premature 
n loss of protein function [see (3) for a full 


among genes underpowered for LOEUF (see methods). To be consistent with 
LOEUF, where low values indicate high gene constraint, AlphaMissense deciles 
are defined such that low deciles correspond to higher pathogenicity. Error bars 
show 95% confidence intervals of multinomial proportions. Horizontal gray 
lines show the percentage of all underpowered genes in each decile bin, which 
represents the expected percent if there is no enrichment or depletion of cell 


definition]. We refer to underpowered genes (expected pLoF < 10) as those with 
insufficient statistical power, as determined in (3), to be classified among the 
most constrained genes by LOEUF. (B) Performance (auROC) of gene-level 
scores at classifying cell essentiality among genes underpowered (left) and well- 
powered (right) for LOEUF. The positive and negative examples for classification 
are 1247 cell essential genes and 728 cell nonessential genes, respectively, 
queried from DepMap (48). Within the LOEUF underpowered genes, there are 
190 positive and 290 negative examples. Conversely, within the LOEUF powered 
genes, there are 1084 positive and 438 negative examples (see methods). 

(C) Distribution of experimentally determined cell essential and cell nonessential 
genes (46) across the deciles of mean AlphaMissense pathogenicity and LOEUF 


essential or nonessentia 


| genes. (D) Experimentally determined structure of the 


SF3b protein complex (PDB ID 5256) that is a cru 


cial component of the U2 small 


nuclear ribonucleoprotein (49). The locations of the small protein subunits 
underpowered for LOEUF are highlighted in purple. (E) Lengths of each protein in 
SF3b (canonical UniProt isoforms). aa, amino acids. (F) Additional characteristics 
of SF3b proteins listed in (E). “Cell essential” means that it is in the list of 
“common essential” genes as determined by DepMap (48). “Expected pLoF” is 
the number of expected pLoF variants under neutral selection within the 

cohort from which LOEUF is derived [as in (A)]. For “Mean AM decile” and 
“LOEUF decile,” 0% indicates the most pathogenic or constrained decile, 


respectively. For f 


urther information, see data S1. 


effects. These resources benefit from the ex- 
panded coverage of confident predictions and 
have value in several contexts. 

The predictions of all possible missense var- 
iants could assist clinicians in prioritizing var- 
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iants for rare disease diagnostics, as they offer 
an important increase in the coverage of con- 
fidently classified missense variants (which 
would otherwise remain variants of unknown 
significance) without being biased toward the 


22 September 2023 


existing human curation process or well- 
studied genes. Out of 69.5 million variants 
unobserved in gnomAD, we were able to make 
a confident prediction for 61.7 million (88.8%) 
missense variants by classifying them as likely 
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Fig. 5. AlphaMissense predictions as a community resource. (A) Example row 
from the AlphaMissense (AM) proteome-wide pathogenicity prediction dataset. 

(B) Pathogenicity class proportions for different variant MAF ranges in gnomAD (top) 
and two different gene sets (bottom): prioritized genes for MAVE studies in (33) 
and clinically actionable genes prioritized by ACMG (32). (C) The proportion of rare 


benign (38.9 million, 56.0%) or likely pathogenic 
(22.8 million, 32.8%) (Fig. 5B). This coverage and 
predictive performance on ClinVar remain 
high among clinically actionable genes priori- 
tized by ACMG (32) (88.9% resolved, average 
auROC of 0.959; fig. S3A). The fraction of pre- 
dicted pathogenic variants decreases with 
increasing allele frequency, as expected by 
purifying selection (Fig. 5B). The MAVE and 
ACMG prioritized proteins have a higher pro- 
portion of predicted pathogenic variants than 
variants absent from gnomAD proteome-wide 
(38.4% and 36.8%, respectively, versus 32.8%), 
in line with the high evolutionary constraint 
and functional importance of these two gene 
sets (Fig. 5B and fig. S3, A and B). 

This resource could also inform studies of 
complex trait genetics (50, 57). We compared 
the proportion of rare variants (MAF < 0.01) 
that are statistically associated with any of 
4000 traits in the UK Biobank (52) for dif- 
ferent classes of variation (see methods). We 
found that missense variants predicted as likely 
pathogenic by AlphaMissense contained twice 
as many trait associations compared with 
synonymous variants (Fig. 5C and fig. S8A), 
and that the rate of associations among pre- 
dicted likely pathogenic variants is statistically 
indistinguishable (P = 0.43, Fisher exact test) 
from pLoF variants (Fig. 5C). In contrast, the 
rates among both ambiguous and likely be- 
nign variant sets are significantly lower (P < 
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0.05, Fisher exact test), with likely benign var- 
iants having the most similar rate to synony- 
mous variants (Fig. 5C). By combining var- 
iants from both AlphaMissense pathogenic and 
pLoF categories, we increase the number of 
candidate deleterious rare variants by 3.2-fold, 
translating to ~7000 additional genes that 
would be testable in gene-level association analy- 
ses in large-scale cohorts such as UK Biobank 
(2) (cumulative allele count > 50 in UK Bio- 
bank; fig. S8B). As such, our annotation of 
missense variants could be a powerful tool 
for discovery of previously unknown genes 
underlying complex traits (50, 51). 

The predictions of all possible amino acid 
substitutions are intended for studying the 
full range of single-residue perturbations. For 
example, predictions could be used as a start- 
ing point for designing and interpreting ex- 
periments that probe saturating amino acid 
substitutions across the human proteome, as 
performed by the MAVE community. Such 
scores can be used alongside the AlphaFold 
Structure Database (2/, 53) to assess the pre- 
dicted pathogenicity in the context of pre- 
dicted protein structures for every single human 
protein. Together, AlphaMissense predictions 
have the potential to accelerate our under- 
standing of the molecular effects of variants 
on protein function, contribute to the discov- 
ery of disease-causing genes, and increase the 
diagnostic yield of rare genetic diseases. 
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AlphaMissense 


variants (MAF < 0,01) that have a statistical association (P < 1 x 10°°) with at least 
one of ~4000 UK Biobank traits (see methods). Counts above each variant set are 
the number of variants with an association over the total number in that set. An asterisk 
indicates that the proportion is significantly different from the pLoF set (Fisher exact 
test, P < 0.05), and a minus sign indicates that there is no evidence of a difference. 


Materials and methods summary 

Full details of the methods are described in 
the supplementary materials and are summar- 
ized here. The model architecture is similar to 
that of AlphaFold (27), with minor modifica- 
tions. AlphaMissense was trained in two 
stages: structure pretraining and variant fine- 
tuning. The pretraining stage is the same as 
described in AlphaFold, except with higher 
weights on the masked MSA reconstruction 
loss. During fine-tuning, the model is opti- 
mized to predict both variant pathogenicity 
and structure of the reference sequence. The 
benign training variants are derived from ob- 
served variants in human and primate species 
following the PrimateAI approach (12). Path- 
ogenic training variants are sampled from 
unobserved variants with sampling weights 
depending on the trinucleotide context and 
the gene. A small subset of ClinVar (1263 path- 
ogenic and 1263 benign) variants are used as 
the evaluation set for model selection and hy- 
perparameter optimization. The variant effect 
prediction score is defined as the log-likelihood 
difference between the reference amino acid 
and the alternative amino acid. The final mod- 
el predictions are the average of six models: 
three independently trained models (with 
minor hyperparameter differences) each run 
twice, once with diversity filtering on the MSA 
and once without. Raw model prediction scores 
are calibrated with the ClinVar evaluation set 
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to represent approximate probabilities (we 
refer to the calibrated scores as AlphaMissense 
pathogenicity). Finally, we define threshold 
score values to interpret a variant as “likely 


pathogenic, 


66 


ambiguous,” or “likely benign.” 


These values are derived such that the labels 
are assigned with 90% precision on ClinVar 
variants, following the approach of EVE (17). 
The model performance was compared with 
previous computational methods using multi- 
ple evaluation datasets: ClinVar (30), de novo 
variants from the DDD cohort (54), cancer hot- 
spot mutations (10), MAVE data of 72 proteins 
from ProteinGym (https://www.proteingym.org/) 
and additional MAVE data of 20 proteins col- 
lected from the literature. Transcript-level mean 
AlphaMissense pathogenicity is calculated by 
averaging the pathogenicity scores of all pos- 
sible single-nucleotide missense variants per 
transcript. For methods associated with the 
analysis of properties of the model outputs 
beyond the primary evaluation metrics (e.g., 
relationship with allele frequencies and cell 
essentiality) we refer readers to the supple- 
mentary materials. 


REFERENCES AND NOTES 


1. 


10. 


lL. 


12. 


13. 


Cheng et al., Science 381, eadg7492 (2023) 


M. Lek et al., Analysis of protein-coding genetic variation in 
60,706 humans. Nature 536, 285-291 (2016). doi: 10.1038/ 
naturel9057; pmid: 27535533 

C. Bycroft et al., The UK Biobank resource with deep 
phenotyping and genomic data. Nature 562, 203-209 (2018). 
doi: 10.1038/s41586-018-0579-z; pmid: 30305743 

. Karczewski et al., The mutational constraint spectrum 
quantified from variation in 141,456 humans. Nature 581, 
434-443 (2020). doi: 10.1038/s41586-020-2308-7; 

id: 32461654 

. Fowler, S. Fields, Deep mutational scanning: A new style 
of protein science. Nat. Methods 11, 801-807 (2014). 

doi: 10.1038/nmeth.3027; pmid: 25075907 

. Findlay et al., Accurate classification of BRCAI variants 
with saturation genome editing. Nature 562, 217-222 (2018). 
doi: 10.1038/s41586-018-0461-z; pmid: 30209399 

AVE Alliance Founding Members, The Atlas of Variant Effects 
(AVE) Alliance: understanding genetic variation at nucleotide 
resolution, version 4a, Zenodo (2021); https://doi.org/ 
10.5281/zenodo.7508716. 

|. Adzhubei, D. M. Jordan, S. R. Sunyaev, Predicting functional 
effect of human missense mutations using PolyPhen-2. 

Curr. Protoc. Hum. Genet. 76, 7.20.1-7.20.41 (2013). 

doi: 10.1002/0471142905.hg0720s76; pmid: 23315928 

N. M. loannidis et al., REVEL: An ensemble method for 
predicting the pathogenicity of rare missense variants. Am. J. 
Hum. Genet. 99, 877-885 (2016). doi: 10.1016/j.ajhg.2016.08.016; 
pmid: 27666373 
Y. Wu et al., Improved pathogenicity prediction for rare human 
missense variants. Am. J. Hum. Genet. 108, 1891-1906 (2021). 
doi: 10.1016/j.ajhg.2021.08.012; pmid: 34551312 

H. Zhang, M. S. Xu, X. Fan, W. K. Chung, Y. Shen, 
Predicting functional effect of missense variants using 
graph attention neural networks. Nat. Mach. Intell. 4, 
017-1028 (2022). doi: 10.1038/s42256-022-00561-w; 
pmid: 37484202 
D. G. Grimm et al., The evaluation of tools used to predict the 
impact of missense variants is hindered by two types of 
circularity. Hum. Mutat. 36, 513-523 (2015). doi: 10.1002/ 
humu.22768; pmid: 25684150 

L. Sundaram et al., Predicting the clinical impact of human 
mutation with deep neural networks. Nat. Genet. 50, 1161-1170 
(2018). doi: 10.1038/s41588-018-0167-z; pmid: 30038395 

M. Kircher et a/., A general framework for estimating the 
relative pathogenicity of human genetic variants. 

Nat. Genet. 46, 310-315 (2014). doi: 10.1038/ng.2892; 

pmid: 24487276 


20. 


al. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33: 


34. 


35. 


P. C. Ng, S. Henikoff, SIFT: Predicting amino acid changes that 
affect protein function. Nucleic Acids Res. 31, 3812-3814 
(2003). doi: 10.1093/nar/gkg509; pmid: 12824425 


. T. A. Hopf et al., Mutation effects predicted from sequence co- 


variation. Nat. Biotechnol. 35, 128-135 (2017). doi: 10.1038/ 
nbt.3769; pmid: 28092658 


. E. Laine, Y. Karami, A. Carbone, GEMME: A simple and fast 


global epistatic model predicting mutational effects. Mol. Biol. 
Evol. 36, 2604-2619 (2019). doi: 10.1093/molbev/msz179; 
pmid: 31406981 
J. Frazer et al., Disease variant prediction with deep generative 
models of evolutionary data. Nature 599, 91-95 (2021). 

doi: 10.1038/s41586-021-04043-8; pmid: 34707284 


. J. Meier et al., Language models enable zero-shot prediction of 


the effects of mutations on protein function. bioRxiv 
2021.07.09.450648 [Preprint] (2021); https://doi.org/10.1101/ 
2021.07.09.450648. 


. P. Notin et al., Tranception: Protein fitness prediction with 


autoregressive transformers and inference-time retrieval. 
Proc. Mach. Learn. Res. 162, 16990-17017 (2022). 
Z. Lin et al., Evolutionary-scale prediction of atomic-level protein 
structure with a language model. Science 379, 1123-1130 (2023). 
doi: 10.1126/science.ade2574; pmid: 36927031 

J. Jumper et al., Highly accurate protein structure prediction 
with AlphaFold. Nature 596, 583-589 (2021). doi: 10.1038/ 
s41586-021-03819-2; pmid: 34265844 

S. Ittisoponpisan et al., Can predicted protein 3D structures 
provide reliable insights into whether missense variants are 
disease associated? J. Mol. Biol. 431, 2197-2212 (2019). 

doi: 10.1016/j.jmb.2019.04.009; pmid: 30995449 

A. Schmidt et al., Predicting the pathogenicity of missense 
variants using features derived from AlphaFold2. Bioinformatics 
39, btad280 (2023). doi: 10.1093/bioinformatics/btad280; 
pmid: 37084271 

B. Li, D. M. Roden, J. A. Capra, The 3D mutational constraint 
on amino acid sites in the human proteome. Nat. Commun. 
13, 3273 (2022). doi: 10.1038/s41467-022-30936-x; 

pmid: 35672414 

K. Tunyasuvunakool et al., Highly accurate protein structure 
prediction for the human proteome. Nature 596, 

590-596 (2021). doi: 10.1038/s41586-021-03828-1; 

pmid: 34293799 

G. R. Buel, K. J. Walters, Can AlphaFold2 predict the impact of 
missense mutations on structure? Nat. Struct. Mol. Biol. 29, 
1-2 (2022). doi: 10.1038/s41594-021-00714-2; 

pmid: 35046575 

R. M. Rao et al., MSA Transformer. Proc. Mach. Learn. Res. 139, 
8844-8856 (2021). 

A. Rives et al., Biological structure and function emerge from 
scaling unsupervised learning to 250 million protein 
sequences. Proc. Natl. Acad. Sci. U.S.A. 118, 2016239118 
(2021). doi: 10.1073/pnas.2016239118; pmid: 33876751 

B. J. Livesey, J. A. Marsh, Using deep mutational scanning to 
benchmark variant effect predictors and identify disease 
mutations. Mol. Syst. Biol. 16, e9380 (2020). doi: 10.15252/ 
msb.20199380; pmid: 32627955 

M. J. Landrum et al., ClinVar: Improving access to variant 
interpretations and supporting evidence. Nucleic Acids Res. 
46, D1062-D1067 (2018). doi: 10.1093/nar/gkx1153; 

pmid: 29165669 

J. M. Havrilla, B. S. Pedersen, R. M. Layer, A. R. Quinlan, A map 
of constrained coding regions in the human genome. Nat. 
Genet. 51, 88-95 (2019). doi: 10.1038/s41588-018-0294-6; 
pmid: 30531870 

D. T. Miller et al., ACMG SF v3.1 list for reporting of secondary 
findings in clinical exome and genome sequencing: A policy 
statement of the American College of Medical Genetics and 
Genomics (ACMG). Genet. Med. 24, 1407-1414 (2022). 

doi: 10.1016/j.gim.2022.04.006; pmid: 35802134 

D. Kuang et al., Prioritizing genes for systematic variant 
effect mapping. Bioinformatics 36, 5448-5455 (2021). 

doi: 10.1093/bioinformatics/btaal008; pmid: 33300982 

S. Richards et al., Standards and guidelines for the 
interpretation of sequence variants: A joint consensus 
recommendation of the American College of Medical Genetics 
and Genomics and the Association for Molecular Pathology. 
Genet. Med. 17, 405-424 (2015). doi: 10.1038/gim.2015.30; 
pmid: 25741868 

M. H. Hgie, M. Cagiada, A. H. Beck Frederiksen, A. Stein, 

K. Lindorff-Larsen, Predicting and interpreting large-scale 
mutagenesis data using analyses of protein stability and 
conservation. Cell Rep. 38, 110207 (2022). doi: 10.1016/ 
j.celrep.2021.110207; pmid: 35021073 


22 September 2023 


36 


37. 


38. 


39. 


40. 


4l. 


42. 


43. 


AA. 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


52. 


53. 


54. 


55. 


56. 


. K. A. Matreyek et al., Multiplex assessment of protein variant 
abundance by massively parallel sequencing. Nat. Genet. 50, 
874-882 (2018). doi: 10.1038/s41588-018-0122-z; 

pmid: 29785012 
A. Laddach, J. C. F. Ng, F. Fraternali, Pathogenic missense 
protein variants affect different functional pathways and 
proteomic features than healthy population variants. 

PLOS Biol. 19, e3001207 (2021). doi: 10.1371/journal. 
pbio.3001207; pmid: 33909605 

D. Munro, M. Singh, DeMaSk: A deep mutational scanning 
substitution matrix and its use for variant impact prediction. 
Bioinformatics 36, 5322-5329 (2021). doi: 10.1093/ 
bioinformatics/btaal030; pmid: 33325500 

S. Henikoff, J. G. Henikoff, Amino acid substitution matrices 
from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89, 
10915-10919 (1992). doi: 10.1073/pnas.89.22.10915; 

pmid: 1438297 
S. Fayer et al., Closing the gap: Systematic integration of 
multiplexed functional data resolves variants of uncertain 
significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 
108, 2248-2258 (2021). doi: 10.1016/j.ajhg.2021.11.001; 
pmid: 34793697 
B. J. Livesey, J. A. Marsh, Updated benchmarking of variant 
effect predictors using deep mutational scanning. 

Mol. Syst. Biol. 19, e11474 (2023). doi: 10.15252/ 
msb.202211474; pmid: 37310135 
J. J. Kwon, W. C. Hahn, A leucine-rich repeat protein provides a 
SHOC2 the RAS circuit: a structure-function perspective. 

Mol. Cell. Biol. 41, e€00627-20 (2021). doi: 10.1128/ 
MCB.00627-20; pmid: 33526449 

J. J. Kwon et al., Structure-function analysis of the 
SHOC2-MRAS-PPIC holophosphatase complex. Nature 

609, 408-415 (2022). doi: 10.1038/s41586-022-04928-2; 
pmid: 35831509 
S. M. Sternisha, B. G. Miller, Molecular and cellular 
regulation of human glucokinase. Arch. Biochem. Biophys. 
663, 199-213 (2019). doi: 10.1016/j.abb.2019.01.011; 
pmid: 30641049 
S. Gersing et al., A comprehensive map of human glucokinase 
variant activity. Genome Biol. 24, 97 (2023). doi: 10.1186/ 
$13059-023-02935-8; pmid: 37101203 

T. Hart et al., Evaluation and design of genome-wide CRISPR/ 
SpCas9 knockout screens. G3 7, 2719-2727 (2017). 

doi: 10.1534/g3.117.041277; pmid: 28655737 

K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, A. Siepel, 
Detection of nonneutral substitution rates on mammalian 
phylogenies. Genome Res. 20, 110-121 (2010). doi: 10.1101/ 
gr.097857.109; pmid: 19858363 

J. M. Dempster et al., Extracting biological insights from the 
Project Achilles genome-scale CRISPR screens in cancer cell 
ines. bioRxiv 720243 [Preprint] (2019); https://doi.org/ 
0.1101/720243. 

C. Sun, The SF3b complex: Splicing and beyond. Cell. Mol. 
Life Sci. 77, 3583-3595 (2020). doi: 10.1007/s00018-020- 
03493-z; pmid: 32140746 

L. Bomba, K. Walter, N. Soranzo, The impact of rare and low- 
requency genetic variants in common disease. Genome Biol. 
18, 77 (2017). doi: 10.1186/s13059-017-1212-4; 

pmid: 28449691 

S. Lee, G. R. Abecasis, M. Boehnke, X. Lin, Rare-variant 
association analysis: Study designs and statistical tests. 

Am. J. Hum. Genet. 95, 5-23 (2014). doi: 10.1016/ 
j.ajhg.2014.06.009; pmid: 24995866 

K. J. Karczewski et al., Systematic single-variant and 
gene-based association testing of thousands of phenotypes 
in 394,841 UK Biobank exomes. Cell Genomics 2, 

100168 (2022). doi: 10.1016/j.xgen.2022.100168; 

pmid: 36778668 

M. Varadi et al., AlphaFold Protein Structure Database: 
Massively expanding the structural coverage of protein- 
sequence space with high-accuracy models. Nucleic Acids Res. 
50, D439-D444 (2022). doi: 10.1093/nar/gkab1061; 

pmid: 34791371 

Deciphering Developmental Disorders Study, Prevalence and 
architecture of de novo mutations in developmental disorders. 
Nature 542, 433-438 (2017). doi: 10.1038/nature21062; 
pmid: 28135719 

P. Petit et al., The active conformation of human glucokinase is 
not altered by allosteric activators. Acta Crystallogr. D 67, 
929-935 (2011). doi: 10.1107/S0907444911036729; 

pmid: 22101819 

A. L. Gloyn et al., in Glucokinase and Glycemic Disease: 

From Basics to Novel Therapeutics, F. M. Matschinsky, 


10 of 11 


¢ 


RESEARCH | RESEARCH ARTICLE 


M. A. Magnuson, Eds., vol. 16 of Frontiers in Diabetes (S. Karger 
AG, 2004), pp. 92-109. 

57. J. Cheng et al., Source code for AlphaMissense, version 1.0.0, 
Zenodo (2023); https://doi.org/10.5281/zenodo.8208697. 

58. J. Cheng et al., Predictions of AlphaMissense, version 1.0.0, 
Zenodo (2023); https://doi.org/10.5281/zenodo.8208688. 


ACKNOWLEDGMENTS 


We thank K. Tunyasuvunakool, R. Fergus, and E. Papa for their 
insights and manuscript reviews; D. La for feedback on structural 
representations and analyses; Z. Wu and S.-J. Dunn for their 
contributions at the early stage of the project; the Research 
Platform colleagues for their continuous support; and other 
colleagues at DeepMind and Google for their encouragement and 
support. This research has been conducted using summary 
statistics generated from the UK Biobank resource (under 
applications 26041 and 48511), accessed at https://app.genebass. 
org/ (52). Funding: All research in this study was funded by 
DeepMind and Alphabet. There was no external funding. Author 
contributions: J.C. and Z.A. conceptualized the study with input 
from J.J., A.W.S., P.K., and D.H.; J.C. and Z.A. managed and 
supervised the project; J.C. and G.N. developed the model with 
input from AP., Z.A., JJ., and AW.S.; J.C. G.N., Z.A., AZ. and R.G.S. 
developed the data pipeline; J.C. and G.N. performed modeling 
experiments with help from Z.A.; J.C., G.N., J.P., C.B., T.A., and 

ZA. analyzed data, prepared figures, and wrote the manuscript; 


Cheng et al., Science 381, eadg7492 (2023) 


J.C. and T.A. developed software infrastructure for model 
inference; T.A. developed software for data ingestion (complex 
traits, competitor methods) and generated genome-to-proteome 
maps with support from A.Z.; J.P. analyzed multiplexed assays 

of variant effect (MAVEs) data with help from J.C.; J.P. curated 
and analyzed structural data; C.B. and T.A. conceived of and 
executed analysis of complex traits; C.B. conceived of and 
executed analysis of gene constraint and cell essentiality with help 
from J.P.; T.A. performed inference for ESMlv with help from 
AZ.; AZ. helped with software infrastructure and generated 
structure visualization utilities; L-H.W. reviewed the DMS literature, 
annotated ProteinGym data, and managed and coordinated 
project planning and execution; M.Z. reviewed code, provided 
feedback on the methodology, and contributed to the proteome- 
wide analysis; T.S. contributed to the training data preparation 
and software infrastructure; D.H. and P.K. contributed to 
management and supervision; J.C., G.N., J.P., C.B., T.A., AZ., 
AP., L.H.W., M.Z., T.S., A.W.S., and Z.A. edited the manuscript. 
All authors reviewed the manuscript. Competing interests: This 
work was done in the course of employment at DeepMind, with no 
other competing financial interests. J.C., G.N., and Z.A. have filed 
provisional patent applications relating to machine learning for 
predicting missense variant effects (US Provisional Patent nos. 
63/415,117 and 63/479,653). Data and materials availability: 
The source code of AlphaMissense is available at Zenodo (57) and 
https://github.com/deepmind/alphamissense. Predictions for all 


22 September 2023 


human missense variants and amino acid substitutions are 
available at Zenodo (58) or https://console.cloud.google.com/ 
storage/browser/dm_alphamissense. Researchers interested in 
predictions not yet provided, and for noncommercial use, can send 
an expression of interest to alphamissense@google.com. As part 
of our commitment to releasing our research breakthroughs 
safely and responsibly, we will not be sharing model weights, 

to prevent use in potentially unsafe applications. License 
information: Copyright © 2023 the authors, some rights reserved; 
exclusive licensee American Association for the Advancement of 
Science. No claim to original US government works. https://www. 
science.org/about/science-licenses-journal-article-reuse 


SUPPLEMENTARY MATERIALS 


science.org/doi/10.1126/science.adg7492 
Supplementary Note 

Materials and Methods 

Figs. S1 to S10 

Tables S1 to S6 

References (59-100) 

MDAR Reproducibility Checklist 

Data S1 to S9 


Submitted 19 January 2023; accepted 23 August 2023 
Published online 19 September 2023 
10.1126/science.adg7492 


11 of 11 


RESEARCH 


RESEARCH ARTICLE SUMMARY 


MOLECULAR BIOLOGY 


In silico protein interaction screening uncovers 
DONSON’s role in replication initiation 


Yang Lim}, Lukas Tamayo-Orregoy, Ernst Schmid, Zygimante Tarnauskaite+, Olga V. Kochenova, 
Rhian Gruar, Sachiko Muramatsu, Luke Lynch, Aitana Verdu Schlie, Paula L. Carroll, Gheorghe Chistol, 
Martin A. M. Reijns, Masato T. Kanemaki, Andrew P. Jackson*, Johannes C. Walter* 


INTRODUCTION: Rapid and faithful DNA repli- 
cation maintains genome integrity and is often 
disrupted in human diseases such as cancer. A 
key event during replication initiation is the 
assembly of the CDC45-MCM2-7-GINS (CMG) 
helicase, which unwinds DNA at the fork and 
is composed of the hexameric MCM2-7 ATPase 
and two accessory factors, CDC45 and GINS. 
After MCM2-7 double hexamers are recruited 
to origins in G1 and cells enter S phase, CDC45 
and GINS associate sequentially with MCM2-7 
double hexamers, leading to CMG assembly. 
In yeast, GINS is escorted to MCM double 
hexamers by a “pre-loading complex” (pre-LC) 
containing Dpbi1, Sld2, and DNA polymerase 
e (Pol €), but how GINS is recruited to origins 
in metazoa remains unclear. In addition, multi- 


cellular organisms contain proteins such as 
Downstream Neighbor of SON (DONSON) 
that impact replication through unknown 
mechanisms. 


RATIONALE: We wanted to understand how 
DONSON, which is mutated in microcephalic 
dwarfism, contributes to DNA replication. To 
this end, we used a combination of biochem- 
istry, cell biology, mouse genetics, and in silico 
screening for novel protein-protein interactions. 


RESULTS: We first immunodepleted DONSON 
from cell-free Xenopus laevis egg extracts, which 
recapitulate genome maintenance processes. 
We found that in the absence of DONSON, 
DNA replication and CMG assembly were abol- 


A Protein of interest (@ DONSON) 


Screen for interactors using AlphaFold-Multimer 


Mechanistic hypothesis 


Experimental validation 


| 


B Molecular model 


@ DONSON 
© TOPBPL 
@ POLE2 


Dimeric pre-Loading 


CDC45-MCM 
Double hexamer 


, Complex (pre-LC) 


GINS + Pol ¢ delivery 
TOPBP1 departure 


CMG + Pol « 
(DONSON still bound) 


DONSON scaffolds a pre-Loading complex that delivers GINS to CDC45-MCM2-7 for CMG assembly. 
(A) Steps taken to generate and validate a model of how DONSON promotes CMG assembly. (B) An AlphaFold model of the 
dimeric pre-LC is shown docking onto the CDC45-MCM2-7 double hexamer, leading to formation of two CMGE (CMG + 
Pol €) complexes. We propose that pre-LC docking requires separation of the two MCMs and rotation of one MCM relative to 
the other. Whether DONSON remains associated with CMGE, as shown, is unclear. Several subunits of Pol « and most 
domains of TOPBP1 are not shown nor are flexible loops connecting the different parts of DONSON and TOPBP1. 
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ished. Using an inducible DONSON ce He 
allele, we showed that DONSON is alstase 
sential for CMG assembly and DNA replica- 
tion in human cells. To explore how DONSON 
promotes CMG formation, we searched the 
entire replisome for DONSON-interacting 
proteins using AlphaFold-Multimer (AF-M). 
This focused in silico screen identified GINS, 
TOPBP!1 (Dpb11 ortholog), POLE2 (Pol ¢ sub- 
unit), MCM3 (MCM2-7 subunit), and DONSON 
itself as high confidence DONSON interactors. 
Based on these predictions, we postulated 
that DONSON scaffolds a dimeric vertebrate 
pre-LC containing two copies of DONSON, 
TOPBP1, GINS, and Pol ¢e, and that this pre-LC 
positions GINS on CDC45-MCM2-7 for CMG 
assembly (see figure). Consistent with this 
model, biochemical studies confirmed that 
DONSON’s interactions with TOPBP1 and 
GINS are essential for CMG formation and | 
DNA replication. On the other hand, DONSON’s 
interaction with MCM3 was dispensable for 
pre-LC assembly and instead promoted pre- 
LC docking onto MCM double hexamers. A 
DONSON mutation that causes microcephalic 
dwarfism in humans recapitulated this con- 
dition in mice and compromised both rep- ‘ 
lication initiation and CMG assembly. Finally, 
when we screened the entire human proteome 
for DONSON interactors using AF-M and ad- 
ditional criteria, CMG assembly factors were ‘ 
highly enriched, suggesting the potential of in 
silico screening for ab initio protein function 
identification. 


CONCLUSION: We identify DONSON as a factor 
required for vertebrate CMG assembly. It func- 
tions by organizing a pre-LC that delivers 
GINS to its binding site on CDC45-MCM2-7, 
leading to CMG assembly. Our results suggest 
that although all eukaryotes contain a pre-LC, 
their architectures differ, with vertebrates using ‘ 
DONSON instead of Sld2. Our results also 
implicate defective CMG assembly in the 
pathophysiology of microcephalic dwarfism. - 
We propose that impaired replication initia- 
tion contributes to slow cell cycle progression, 
leading to the hypocellularity seen in this form 
of dwarfism. Our results illustrate that large- 
scale in silico protein-protein interaction screen- 
ing using AF-M is an effective method to identify 
functionally relevant interactions and formulate 
testable molecular hypotheses. 
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CDC45-MCM2-7-GINS (CMG) helicase assembly is the central event in eukaryotic replication initiation. 
In yeast, a multi-subunit “pre-loading complex” (pre-LC) accompanies GINS to chromatin-bound 
MCM2-7, leading to CMG formation. Here, we report that DONSON, a metazoan protein mutated in 
microcephalic primordial dwarfism, is required for CMG assembly in vertebrates. Using AlphaFold to 
screen for protein-protein interactions followed by experimental validation, we show that DONSON 
scaffolds a vertebrate pre-LC containing GINS, TOPBP1, and DNA pol «. Our evidence suggests that 
DONSON docks the pre-LC onto MCM2-7, delivering GINS to its binding site in CMG. A patient-derived 
DONSON mutation compromises CMG assembly and recapitulates microcephalic dwarfism in mice. 
These results unify our understanding of eukaryotic replication initiation, implicate defective CMG 
assembly in microcephalic dwarfism, and illustrate how in silico protein-protein interaction screening 


accelerates mechanistic discovery. 


apid and faithful DNA replication is 

necessary for cell proliferation and the 

maintenance of genome integrity, and 

its disruption causes cancer and many 

inherited human diseases. A key com- 
ponent of the replisome is the replicative CMG 
helicase, which is composed of CDC45, MCM2-7, 
and GINS, and unwinds DNA at the replication 
fork. CMG assembly involves several discrete 
steps and is best understood in yeast (Fig. 1A) 
(2). In the G1 phase, the hexameric MCM2-7 
ATPase is loaded onto origins of replication in 
a head-to-head orientation (“double hexamers”), 
a process called licensing. Subsequently, the 
Sld3-Sld7 complex binds to Dbf4-dependent 
kinase (DDK)-phosphorylated MCM2-7 dou- 
ble hexamers on chromatin and recruits Cdc45. 
In parallel, CDK phosphorylation of Sld2 me- 
diates Sld2 binding to Dpb11, promoting the 
assembly of a “pre-loading complex” (pre-LC) 
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that also contains GINS and Pol € (2-5). CDK 
also phosphorylates Sld3, and binding of Dpb11 
to phosphorylated Sld3 allows docking of the 
pre-LC onto Cdc45-MCM2-7, delivering GINS 
for stable CMG assembly. Notably, Cdc45 binds 
MCM2-7 only weakly until GINS is recruited, 
when a stable CMG complex forms (5-7). Once 
assembled, CMGs are activated for DNA un- 
winding by MCM10, followed by replisome 
assembly and bi-directional DNA replication. 
Despite these insights, a molecular understand- 
ing of how the pre-LC promotes GINS associ- 
ation with Cdc45-MCM2-7 is still lacking, even 
in yeast, primarily because of the absence of 
relevant structural information (1). 

The mechanism of CMG assembly in meta- 
zoans is broadly similar to that observed in 
yeast. TRESLIN and MTBP are thought to per- 
form the same role in Cdc45 recruitment as 
their yeast counterparts, Sld3 and Sld7 (Fig. 1B) 
(8-12). Furthermore, analogous to S1d3’s in- 
teraction with Dpb11, CDK-phosphorylated 
TRESLIN binds to Dpb11’s ortholog, TOPBP1 
(8, 13-15). TOPBP1 in turn contacts GINS (J6). 
However, a central mystery concerns the meta- 
zoan counterpart of Sld2 because RECQLA, the 
closest vertebrate homolog of Sld2, functions 
after CMG assembly (17, 18). Furthermore, al- 
though Pol « is an essential component of the 
pre-LC in yeast, this polymerase is dispensable 
for vertebrate CMG assembly (19, 20). Finally, 
there remain several unanswered questions 
including whether a vertebrate pre-LC exists, 
how it might be organized, and how GINS is 
delivered in vertebrates. 

Microcephalic dwarfism comprises a family 
of monogenic disorders of extreme growth 
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failure that result from disruption of cellular 
proliferation (27). Many genes implicated in 
microcephalic dwarfism act in DNA replica- 
tion and encode licensing factors, components 
of the CMG helicase, DNA polymerases, and 
replication stress response factors (22). Muta- 
tions in the Downstream Neighbor of SON 
(DONSON) protein also cause microcephalic 
dwarfism, including Meier-Gorlin syndrome, 
a disorder specifically associated with replica- 
tion initiation genes (23, 24). Phenotypic analy- 
sis in Drosophila suggested that DONSON 
plays a role in DNA replication (25). Similar 
to the fly protein, human DONSON expression 
peaks in S phase. It also localizes to sites of 
replication and coimmunoprecipitates with 
several replisome components, including CMG 
components (23, 26). However, in mammals, 
studies using small interfering RNA (siRNA) 
and patient-derived cell lines suggested roles | 
for DONSON in maintaining replication fork 
stability, ATR signaling, and replicative traverse 
of DNA interstrand crosslinks (23, 26). Thus, a 
clear picture of DONSON’s role in genome main- 
tenance has not emerged. Here, we show that 
DONSON organizes a vertebrate pre-LC that 
delivers GINS to its binding site in CMG and 
we implicate defective CMG assembly in the 
pathophysiology of microcephalic dwarfism. 


Results 
DONSON is required for CMG assembly in frog 
egg extracts 


To assess DONSON’s role in genome mainte- 
nance, we used nucleus-free Xenopus laevis 
egg extracts, which faithfully recapitulate DNA 
replication and the replication stress response 
(27). Plasmid DNA is first incubated with a 
high-speed supernatant (HSS) of total egg 
lysate, which promotes replication licensing 
(fig. S1A). Subsequent addition of a concen- 
trated nucleoplasmic extract (NPE) leads to 
CMG assembly and a complete round of DNA 
replication that can be monitored through 
[a-**P]dATP incorporation. Immunodepletion - 
of DONSON from HSS and NPE abolished 
DNA replication, which was partially rescued 
by re-addition of bacterially expressed DONSON 
(fig. S1, B and C). Partial rescue was explained 
by the fact that depletion of DONSON co- 
depleted roughly half of the CDK2-Cyclin E (fig. 
S1, D to E), which is rate-limiting for repli- 
cation in nucleus-free egg extracts (27-30). 
Indeed, when we supplemented DONSON- 
depleted extracts with recombinant CDK2-Cyclin 
El (@CDK2-Cyclin E1) (fig. SIE), recombinant 
DONSON fully rescued replication (Fig. 1C). 
Thus, DONSON is required for vertebrate, cell- 
free DNA replication. 

To determine which replication step is depen- 
dent on DONSON, we performed chromatin pull- 
down experiments. As shown in Fig. 1D, licensing, 
as measured by MCM7 chromatin binding, was 
unaffected by DONSON depletion or add-back 
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Fig. 1. DONSON is required for CMG assembly and DNA replication. (A and 
B) Models of CMG assembly in budding yeast and vertebrates. (C) Relative DNA 
replication efficiency in the indicated egg extracts. Because depletion of DONSON co- 
depletes roughly half of the 0.5 to 1 uM endogenous CDK2-Cyclin E, DONSON- 
depleted (ADONSON) extracts but not mock-depleted extracts were supplemented 
with 0.3 uM recombinant human CDK2-Cyclin El. Recombinant DONSON (rDONSON, 
fig. SIB) was added where indicated. Datapoints, n = 3 experiments. Mean + SD. 

A representative western blot of total protein levels in these reactions is shown in fig. 
SIE. (D) Plasmid DNA was incubated in the indicated egg extracts. At the specified 


(anes 5, 7, and 9). Moreover, TRESLIN-MTBP 
recruitment did not depend on DONSON, sug- 
gesting that DONSON is not required for initial 
steps in CMG assembly (fig. SIF, lanes 6 and 8; 
fig. SIG, lanes 8 and 10). By contrast, CDC45, 
GINS, Pol ¢, Pol o, and proliferating cell nuclear 
antigen (PCNA) recruitment—which occurred 
within 10 min of NPE addition—failed in 
DONSON-depleted extract, but their recruit- 
ment was restored with recombinant DONSON 
(Fig. 1D, lanes 6, 8, and 10). DONSON deple- 
tion also abolished CMG assembly in extracts 
lacking replication protein A (RPA) (Fig. 1E, 
lanes 7 and 8). RPA is required for origin un- 
winding and replication elongation (31, 32), as 
seen from defective PCNA and Pol a loading 
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Proteins pulled down with 
plasmids replicating in HSS+NPE 


and the persistence of CDC45, GINS, and Pol € 
on chromatin at the 40-min point (Fig. 1E, lanes 
5 and 6; histone H3 loading is low compared 
with mock-depleted extract due to deficient 
replication). Thus, DONSON is required for 
de novo CMG assembly, independently of any 
effects on CMG stability during the subsequent, 
RPA-dependent unwinding and elongation 
phases of replication initiation. The defects 
seen upon DONSON depletion mirrored the ef- 
fect of adding the CDK2 inhibitor p27*” (33, 34) 
(Fig. 1D, lane 4, CDKi), suggesting that DONSON 
functions at the final stage of CMG assembly. 
Consistent with this model, DONSON binding 
to chromatin was blocked by CDKi, DDKi, and 
geminin, an inhibitor of origin licensing (Fig. 1F). 
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times following NPE addition, chromatin was recovered and blotted for the indicated 
proteins. DONS, DONSON; Gem, geminin; CDKi, p27“°. A representative western 
blot of total protein levels in these reactions is shown in fig. SIE. (E) Plasmid DNA was 
incubated in extracts depleted of DONSON and/or RPA. At the specified times 
following NPE addition, chromatin was recovered and blotted for the indicated 
proteins. Western blot of total protein levels in these reactions is shown in fig. SIH. 
(F) In the presence of the indicated inhibitors of replication initiation, plasmid 
DNA was recovered 10 minutes after 
proteins. DDKi, PHA-767491. 


PE addition and blotted for the indicated 


Collectively, our results show that frog DONSON 
functions after TRESLIN-MTBP loading onto 
licensed chromatin but before assembly of a stable 
CMG helicase containing GINS and CDC45. 


A model for DONSON function based on in silico 
protein-protein interaction screening 


We next used recent advances in structure 
prediction to address how DONSON promotes 
CMG assembly. AlphaFold2 predicts that 
DONSON contains a ~150-residue disordered 
N-terminal tail, a globular domain, and an 
~80-residue loop protruding from the globular 
domain (Fig. 2A). However, this structure on 
its own offers no insight into DONSON func- 


tion. We therefore used AlphaFold-Multimer 
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Fig. 2. Hypothetical model of DONSON 
function in CMG assembly. (A) AlphaFold- 
ultimer (AF-M) prediction of DONSON's 
structure. Sites predicted to bind interact- 
ing proteins are indicated with arrows. 

| proteins shown are from Xenopus, but 
the predicted human complexes appear 
most identical. (B to F) AF-M predictions 
of relevant DONSON domains complexed 
with SLD5 (B), TOPBP1 (C), POLE2 (D), 
CM3 (E), and a second copy of DONSON 
(F). The amino acids (aa) of each protein 
shown are indicated in brackets. (G) 
Functional domains of TOPBP1. (H) AF-M 
predictions suggest that a pre-LC consisting 
of DONSON, GINS, Pol ¢, and TOPBP1 
docks onto the MCM2-7 complex through 
the predicted DONSON-MCM3 interaction. 
(Top) The predicted Xenopus pre-LC is 
shown with only the POLE2 subunit of Pol « 
and just the BRCT3 (aa 343-447), GINI 

(aa 475-492), and BRCT4-5 (aa 538-734) 
domains of TOPBPI1. Disordered regions of 
DONSON are shown as dotted lines but 
omitted for TOPBP1. Residues located at 
the ends of well-ordered segments are 
shown in green and purple for DONSON and 
TOPBPI, respectively. (Bottom) The pre-LC 
was docked onto the cryo-EM structure 

of human CMG (PDB: 7PLO) (64) by aligning 
on MCM3. Only CDC45 and MCM2-7 of 
human CMG are shown. See Methods for 
modeling details. (I) The pre-LC from (H) 
rotated by 90 degrees. Residues located at 
the ends of well-ordered segments are 
numbered and shown in green and purple 
for DONSON and TOPBPI, respectively. 


> 


pet} 


(AF-M) (35) to screen in silico for potential 
DONSON interactors among a common set of 
~70 core DNA replication factors in humans, 
frogs, worms, and flies. Based on confidence 
metrics generated by AlphaFold, the top pro- 
teins predicted to interact with DONSON in 
all four species included SLD5 (a GINS sub- 
unit), TOPBP1, POLE2 (a Pol € subunit), and 
MCM3 (a MCM2-7 subunit), all of which are 
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implicated in CMG assembly [Fig. 2, B to E; 
table S1 for AlphaFold confidence values; fig. 
$2 for structures colored by local distance 
difference test (pLDDT) values; fig. S3 for 
predicted alignment error plots]. DONSON 
was also strongly predicted to interact with 
itself (Fig. 2F and fig. S3F). Thus, AF-M-based 
in silico screening was consistent with DONSON 
functioning during CMG assembly. 
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AF-M predicted that DONSON binds TOPBP1, 
MCM3, POLE2, SLD5, and itself through five 
distinct regions as follows: DONSON’s dis- 
ordered N terminus was predicted to bind SLD5 
(Fig. 2B and fig. S3A) and when DONSON was 
folded with the tetrameric GINS complex, the 
interaction was extended to DONSON’s globular 
domain (Fig. 21 and fig. S3B) and the confidence 
of the interaction increased in most organisms 
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(table S1). An adjacent disordered DONSON 
peptide was predicted to bind POLE2 (Fig. 2D 
and fig. S3D). Binding to the AAA+ domain of 
MCMs3 was predicted to involve the flexible 
loop that protrudes from DONSON’s globular 
domain (Fig. 2E and fig. S3E). DONSON’s glob- 
ular domain was predicted to bind the BRCT3 
domain of TOPBP!1 that is essential for DNA 
replication (Fig. 2, C and G, and fig. S3C) (8). 
Finally, another part of the globular domain 
was predicted to mediate dimerization (Fig. 2F 
and fig. S3F). Notably, AF-M predicted that 
DONSON can contact all its potential binding 
partners simultaneously (figs. S4 and S5 and 
data S1). When folded with two copies of 
DONSON, the TOPBP1 BRCT3 domain was 
predicted to bind at the DONSON dimer 
interface (fig. S4, A and D, and data S1), sug- 
gesting that it might stabilize a DONSON dimer. 

Based on these in silico results, we hypothe- 
sized that DONSON organizes a vertebrate 
pre-LC that includes GINS, TOPBPI, and Pol € 
(Fig. 2H, top, and Fig. 21). We further postu- 
lated that this pre-LC docks onto MCM3 to 
deliver GINS to the CDC45-MCM2-7 complex 
(Fig. 2H). We confirmed DONSON dimeriza- 
tion using mass photometry (fig. S6). This 
observation suggests that the pre-LC dimerizes 
(see Discussion) but for simplicity it is depicted 
as a monomer. When the pre-LC was docked 
onto the cryo-electron microscopy (cryo-EM) 
structure of the replisome through the predicted 
DONSON-MCMs3 interaction, GINS from the 
pre-LC aligned well with GINS on CMG [root 
mean square deviation (RMSD) = 5.3 A, fig. $7]. 
Thus, our modeling suggests that DONSON 
promotes CMG assembly by delivering GINS 
directly to its binding site on CDC45-MCM2-7 
(Fig. 2H). 


DONSON organizes a pre-LC containing GINS, 
TOPBPI, and Pol < 


To test the model presented in Fig. 2, we in- 
vestigated which factors interact with DONSON 
in nonreplicating nucleoplasmic extract. Re- 
combinant FLAG-tagged DONSON (fig. SIB, 
right panel) was added to extract and im- 
munoprecipitated (IP’ed). FLAG-DONSON co- 
IP’ed GINS, TOPBP1, POLE2, and POLEcat but 
not RECQIA or MCM3 (Fig. 3A). Reciprocal IP 
of endogenous GINS recovered DONSON, 
TOPBP1, POLE2, and POLEcat (Fig. 3B). We 
conclude that, independently of DNA replica- 
tion, DONSON forms a stable pre-LC with GINS, 
TOPBP!1, and Pol e, but not with MCM3 or 
RECQIA. 

We next probed the architecture of the pre- 
LC using site-directed mutagenesis. Residues 
Y8, N430, and N67 in DONSON were predicted 
to interact with GINS, TOPBP1, and POLE2, 
respectively (fig. S8, A to C). Indeed, mutation 
of each residue to alanine led to defective co-IP 
of GINS, TOPBPI, and Pol e, respectively (Fig. 
3C, lanes 8 to 10, red bars; Fig. 3D). DONSON*™* 
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failed to co-IP not only GINS, as predicted, 
but also TOPBP! (Fig. 3C, blue bar, and Fig. 
3D). This suggests that TOPBP1 binds coop- 
eratively to DONSON and GINS. Consistent 
with this idea, TOPBP1 binds GINS through 
two elements, a short “GINI” peptide located 
in a disordered region of TOPBP1 and the 
BRCT4-5 domains located C terminal to the 
GINI peptide (Fig. 2G and fig. S9, A to C) 
(16, 36). Unlike the GINI peptide, BRCT4-5 
is not essential for DNA replication but it sta- 
bilizes the interaction of TOPBP1 with GINS 
(16, 36) and DONSON (fig. S10). BRCT4-5 binds 
the same site on GINS that is occupied by 
POLE2 in the fully assembled replisome (fig. 
S9D) (36). We therefore propose that within 
the pre-LC, TOPBP1’s BRCT4-5 and GINI do- 
mains occupy the POLE2 binding site on GINS. 
Furthermore, our observation that DONSON 
interacts with Pol e independently of GINS 
and TOPBPI (Fig. 3C, lane 8) suggests that 
Pole is flexibly tethered by POLE2 to the pre- 
LC through DONSON’s disordered N-terminal 
tail (Fig. 2H) and that it associates with GINS 
only after CMG assembly. 

We next asked whether purified DONSON, 
GINS, and TOPBP!1 are sufficient to form a 
complex. We omitted Pol ¢ because its associa- 
tion with the pre-LC is not essential for CMG 
assembly [(19, 20) see below]. Indeed, purified 
DONSON™” co-IP’ed TOPBP1"*”” and GINS, 
whereas DONSON™™" did not, and DONSON'*°°“ 
IP’d GINS but recovered little TOPBP1 (fig. SIIB, 
lanes 22 to 24), as seen in extracts (Fig. 3C). Fur- 
thermore, DONSON”? and DONSON'*3°4— 
but not DONSON**“—bound efficiently to 
GINS in the absence of TOPBP1 (fig. SIIB, lanes 
14 to 16). We conclude that DONSON, GINS, 
and TOPBP1 are sufficient to form the core of a 
vertebrate pre-LC that also associates with Pol e. 
Pre-LC formation appears to be independent 
of CDK activity (Fig. 3E), consistent with the 
fact that the essential TOPBP1 BRCT domain 
that contacts DONSON (BRCTS3) is not pre- 
dicted to bind phospho-peptides (37). These 
biochemical experiments provide powerful 
support for the pre-LC architecture predicted 
by AF-M. 


DONSON binding to GINS and TOPBP1, but not 
Pol ¢, is required for CMG assembly 


We next assessed whether pre-LC assembly is 
required for DNA replication. DONSON**“ 
(defective in GINS and TOPBP1 recruitment) 
did not support efficient DNA replication (Fig. 
4A) or CMG assembly (Fig. 4B, lane 6). Ad- 
ditional DONSON mutations at the predicted 
DONSON-GINS interface provided further evi- 
dence that poor GINS binding correlates with 
inefficient DNA replication (fig. S12). To per- 
turb the other side of the DONSON-GINS in- 
terface, we mutated histidine 76 in the SLD5 
subunit of GINS, which is predicted to contact 
Y8 in DONSON (fig. S13, A and B). As shown in 
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fig. S13, C and D, GINS" ¢o-IP’d DONSON 
weakly and supported only low levels of DNA 
replication. Thus, DONSON’s interaction with 
GINS is essential for CMG assembly and ef- 
ficient DNA replication. 

We performed a similar analysis for the 
DONSON-TOPBP1 interaction. DONSON*°°4 
(defective in TOPBP1 recruitment) supported 
inefficient CMG assembly and DNA replica- 
tion (Fig. 4, A and B). Additional DONSON 
mutations predicted to disrupt the DONSON- 
TOPBP1 interaction also compromised TOPBP1 
co-IP and DNA replication (fig. S14). Con- 
versely, TOPBP1 mutations engineered at the 
predicted DONSON-TOPBP1 interface showed 
a correlation between poor DONSON binding 
and inefficient DNA replication (fig. S15). 
These results indicate that, similar to the 
DONSON-GINS interaction, the DONSON- 
TOPBPI interaction is required for CMG assem- _ 
bly and DNA replication. 

By contrast, DONSONN™, which failed to 
bind Pol e (Fig. 3C), supported almost normal 
levels of CMG assembly and DNA replication 
(Fig. 4, A and B), consistent with previous 
studies showing CMG formation in Pol e-deficient 
egg extracts and human cells (19, 20). This result 
is also consistent with Pol « being loosely asso- 
ciated with the pre-LC through DONSON, and 
binding tightly to the replisome only after 
CMG has been assembled. Together, our data 
indicate that DONSON, GINS, and TOPBP1 
form the core of an essential, vertebrate pre-LC 
that chaperones GINS onto chromatin-bound 
MCM2-7. Unlike yeast, the vertebrate pre-LC 
contains DONSON instead of Sld2; Pole, though 
present, is dispensable for CMG assembly. 


DONSON’s predicted MCM3 binding domain is 
necessary for CMG assembly 


AF-M predicts with high confidence that an 
a-helix on DONSON’s flexible loop interacts 
with MCM3 (Fig. 2E, fig. S3E, and table S1). To 
test whether this predicted interaction delivers 


GINS to the chromatin (Fig. 2H), we mutated - 


three acidic residues (D374, E377, E384) and a 
highly conserved tryptophan (W381) to arginines 
and alanine, respectively (fig. S8D). All four 
mutations were located in the MCM3-binding 
helix and together generated DONSONPEWE>RRAR 
(Fig. 4C). Although DONSONDEWEBRAR was 
fully competent for pre-LC assembly (Fig. 4D), 
it was deficient in DDK-dependent chromatin- 
binding and CMG assembly (Fig. 4E, lane 8), 
and supported inefficient DNA replication 
(fig. SITE). Because MCM3 did not co-IP with 
DONSON from nonreplicating egg extract (Fig. 
3A and Fig. 4D), we infer that stable binding of 
DONSON to MCM3 only occurs after TOPBP1 
tethers the pre-LC to MCM double hexamers 
(see Discussion). Consistent with this idea, 
DONSON'**, which did not bind TOPBP1 
efficiently (Fig. 3C), was similarly defective as 
DONSONDEWE>RRAR in chromatin binding 
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Fig. 3. DONSON forms a pre-LC. (A) Recombinant FLAG-tagged DONSON 
(rFLAG-DONSON) was added to nonreplicating nucleoplasmic egg extract (NPE), 
recovered, and blotted for the indicated proteins alongside the input extract. 
(B) Endogenous GINS was immunoprecipitated from NPE using PSF3 antibody 
and blotted for the indicated proteins. (C) rFLAG-DONSON proteins containing 
specified mutations (figs. SIB and S11A) were added to NPE, recovered, and 
blotted as indicated. Red and blue bars show missing pre-LC components. The 
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images are part of the same western blot, which was cropped to remove 
irrelevant information between lanes 5 and 6. (D) The effects of different 
DONSON mutants are depicted in the context of the AF-M-modeled pre-LC 

(as in Fig. 2l). Mutations are indicated as red Xs. (E) The indicated rFLAG-DONSON 
proteins were added to NPE treated with buffer, p27“ (CDKi), or 2 phosphatase. 
DONSON was recovered and blotted for the indicated proteins. Total extract 

was also blotted for MCM4 to show a phosphatase activity. 
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Fig. 4. Pre-LC formation is required for DNA replication. (A) Egg extracts 
were depleted of DONSON, supplemented with rCDK2-Cyclin El and the indicated 
DONSON proteins (figs. SIB and S11A), and used to measure DNA replication. 
Datapoints, n = 4 experiments, except N67A where n = 3. Mean + SD. 

(B) Plasmid pull-down (as in Fig. 1D) to assay the effect of DONSON mutations 
on CMG assembly. The images are part of the same western blot, which was 
cropped to remove irrelevant information between the input and lane 1. 
Western blot of total protein levels in these reactions is shown in fig. S1IC. 


(Fig. 4E, lane 7). Thus, our evidence is con- 
sistent with the predicted DONSON-MCM3 
interaction being important to deliver the pre- 
LC to chromatin for CMG assembly. 


DONSON is required for CMG assembly in 
mammalian cells 

DONSON is essential for cell proliferation in mam- 
mals (24). Therefore, to explore DONSON’s 
role in DNA replication, we generated a 
DONSON-AID2 degron HCT116 cell line that 
exhibits rapid degradation (s 1 hour) of endo- 
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genous DONSON upon addition of the auxin 
derivative, 5-Ph-IAA (fig. S16, A and B) (38). 
Acute DONSON depletion during ongoing repli- 
cation impaired fork progression (fig. S16C), 
and DONSON was associated with replisomes 
following initiation in egg extracts (fig. S17), 
consistent with previous studies describing 
DONSON’s role in ongoing replication (23, 26). 
To test whether mammalian DONSON is also 
required for replication initiation, DONSON- 
AID2 cells were synchronized in G1 using 
lovastatin (39) and released into S phase. In 
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Proteins pulled down with 
plasmids replicating in HSS+NPE 


(C) DONSONPEWE>RRAR depicted as in Fig. 3D. (D) FLAG-DONSON immuno- 
precipitation (as in Fig. 3A) showing that DONSONDEWE>RRAR (expressed in 
wheat germ extract) is proficient in pre-LC assembly. The images are part of the 
same western blot, which was cropped to remove irrelevant information between 
lanes 2 and 3. (E) Plasmid pull-down (as in Fig. 1F) showing that purified 
recombinant DONSONN“9°4 and DONSONPEWE>RRAR (fig, S11A) bind inefficiently 
to chromatin during replication. Western blot of total protein levels in these 
reactions is shown in fig. S11D. 


the presence of 5-Ph-IAA to deplete DONSON 
(Fig. 5A), cells retained 2n DNA content, whereas 
in its absence, DNA content increased (Fig. 5B 
and fig. SI6D). Furthermore, DONSON-depleted 
cells did not undergo detectable Ethynyl-2’- 
deoxyuridine (EdU) incorporation (Fig. 5C 
and fig. SIGE). Despite the absence of replica- 
tion, Cyclin A, Cyclin E, and CDK2 levels were 
unaffected by DONSON-depletion after release 
from the Gl arrest (Fig. 5A and fig. S16F), 
consistent with normal cell cycle progression. The 
parental HCT116 OsTIR"“ cell line containing 
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Fig. 5. DONSON is required for DNA replication 
and CMG assembly in mammalian cells. 
(A) Immunoblot of synchronized DONSON-AID2 
HCT116 cells. TCE, total cell extract; Asy, 
asynchronous; Gl, Gl-arrest; R, release, 
hours. (B) Gl-synchronized cells fail to 
progress into S phase after 5-Ph-IAA 
depletion of DONSON. (€) DONSON 
depletion prevents DNA synthesis. EdU 
pulse-labeling at 17 hours post release. 
Flow cytometry plots in panels (B) 

and (C), representative of n = 4 andn = 3 
experiments, respectively, are quantified in 
fig. S16, D and E, respectively. (D and E). 
DONSON is required for CDC45 and GINS 
recruitment to chromatin. (D) Immunoblots, 
soluble extract (Sol.) and chromatin-bound 
proteins (Chrom.) 17 hours post release. 
(E) Quantification, normalized to loading 
control (soluble, o-Tubulin; chromatin, 
Histone H2B) and wild-type protein 

levels. Datapoints, n = 3 experiments. 

Mean + SEM. 


untagged DONSON was unaffected by 5-Ph- 
IAA (fig. S16G). In addition, chromatin recov- 
ery by cell fractionation showed that when 
cells were released from G1 in the absence of 
DONSON, GINS and CDC45 loading were subs- 
tantially reduced (Fig. 5, D and E). In summary, 
our data show that DONSON is essential for 
CMG assembly in mammalian somatic cells, 
and they reinforce prior findings that DONSON 
promotes efficient replication fork progression. 


Defective CMG assembly is associated with 
microcephalic dwarfism in a mouse model 


Biallelic mutations that cause microcephalic 
dwarfism are clustered in the globular domain 


Lim et al., Science 881, eadi3448 (2023) 


A : No Auxin) 5 uM 5-Ph-IAA 
Asy G1 14 17 21 14 17 21 
DONSON 
— —_-_—_—— }100 
(AID2) 
Cyclin A — —_——— omer 
: L50 
Cyclin E1| = — = = ce ee for a 
137 
125 
= _ - r20 
2b) i | 
TCE 
B 
DONSON-AID2 G1/S iS) 
> 
Asy Release: | 14h 17h 
Lovastatin 
a 
pls +5 UM 5-Ph-lAA it = 
= (Auxin analog) 
Ws 
2N 4N 2N 4N 2N 4N 
DNA content DNA content DNA content 
Cc D 4. weezeess Eo .« suneece 
Sol | Chrom. | Sol. | Chrom I 
| 
lo 1.5 ‘ t 
= aia) Lp 8, +p 0.0000 | 
Ar 5 i Q- fl lke ty 
-150 O 054 \ I 
MCM2 \ aed — 7 os G0 I \ 
-75 
CDC45 pan ag <6 
F 
+ 5uM 5-Ph T 25 
1 
7 GINS2 | 5g 
xe} : 
Lu 1 
aTubulin | wr wr", -50 
-20 
17h H2B es is 7 
1 F 0 
2N 4N + 5uM 5-Ph: woot os | BPM G-Phr= 6+! +! 
DNA content ©2220 


of DONSON (fig. S18A) and result in partial 
loss of function by reducing protein levels (23). 
Although most microcephalic dwarfism cases 
are compound heterozygotes, several homozy- 
gous mutations have been identified that are 
more easily modeled in isogenic systems. One 
such variant, M446T (23), was introduced into 
mice (at the corresponding location M440) 
through CRISPR/Cas9 genome editing (fig. 
S18B). The resulting M440T/M440T mouse 
exhibited microcephaly, reduced body size, 
and decreased limb length, confirming patho- 
genicity of the mutation at the organismal level 
and recapitulating a severe form of the human 
phenotype (Fig. 6, A to C, and fig. S18C). As 


22 September 2023 


previously reported in patient-derived cells 
(23), DONSON protein levels were substan- 
tially reduced in Donson™*#07™#°T mouse 
embryonic stem cells (mESCs; fig. SI9A); DNA 
combing demonstrated fork asymmetry (fig. 
S19B), and ATR-mediated checkpoint signaling 
was attenuated (fig. S19C). Additional analysis 
demonstrated that in both M440T/M440T 
mESCs and mouse embryonic fibroblasts 
(MEFs), interorigin distance was substantially 
increased (Fig. 6D and fig. SI9D) whereas fork 
velocity (fig. SI9E) and cell proliferation (fig. 
S19F) were reduced, consistent with a deficit 
in functional replisomes. Furthermore, the 
M440T mutation decreased chromatin-bound 
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Fig. 6. Homozygous M440T mutation impairs replication initiation and causes growth restriction and microcephaly in a mouse model. (A to C) Donson401™40T 
E13.5 mouse embryos exhibit growth restriction by mid-gestation with microcephaly and limb abnormalities. (A) Lateral view; scale bar 1 mm. Datapoints, individual 

mice, Mean + SEM, t-test. Occipit.-front dist, occipital-frontal distance. (B) Oligodactyly in forelimb; scale bar 0.2 mm. See also fig. SI8C. (C) Reduced cellularity as measured 
by cortical thickness is evident in the developing forebrain during neurogenesis (e12.5). Scale bar 20 um. Measurements at dorsal-most point of telencephalon; datapoints, 
individual mice; Mean + SEM, t-test. (D) Increased interorigin distance (IOD) in Donson““4°'40T cells. Representative images of dU-analog pulse-labeled DNA fibers. 
White brackets, measured IODs, mESCs. Data points plotted, fibers, pooled from n = 2 combing experiments. Kb, kilobases. Median + 95% confidence interval; U-test. 
91 wild-type and 53 M440T/M440T fibers were scored for IODs. (E) Reduced chromatin-associated GINS and Cdc45 indicates impaired CMG assembly in Donson401/™440T 
mESCs. (Left) cell fractionation immunoblot. (Right) quantification. n = 3 experiments, mean + SEM, t-test; normalization as in Fig. 5E. 


CDC45 and GINS levels in mESCs (Fig. 6E). 
These replication initiation phenotypes were 
not due to defective checkpoint signaling 
because inhibiting ATR has the opposite effect 
of DONSON deficiency in that it stimulates 
initiation events (origin firing) (40) and en- 
hances CDC45 and GINS recruitment to chro- 
matin (fig. S19G). Hence, in a murine model, 
defective CMG assembly induced by a DONSON 
patient-derived mutation is associated with 
microcephalic dwarfism. 


Ab initio prediction of protein function using 
AlphaFold-Multimer 


Our initial screen for DONSON interactors 
was limited to the 70 replisome proteins be- 
cause we knew from biochemical experiments 
that DONSON is required for CMG assembly 
(table S1). To assess whether in silico screening 
alone would point toward DONSON’s function 
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in CMG assembly, we used AF-M to assess 
DONSON’s potential interaction with nearly all 
20,000 known human proteins. As shown in 
table S2, TOPBP1, MCM3, POLE2, SLD5, and 
DONSON were among the 350 most confident 
DONSON interactors (sheet 2) but GO-term 
analysis did not identify DNA replication as a 
DONSON-associated function. However, when 
we performed a second round of structure pre- 
dictions, taking into account that SLD5 is part 
of the tetrameric GINS complex and that 
TOPBP!1 binds at the DONSON dimer interface 
(apparent from the first round of predictions; 
figs. S3 and S4), all five CMG assembly factors 
were among the 28 most confident, proteome- 
wide interactors (table S2, sheet 3). Alternatively, 
when we considered only the ~500 proteins 
associated with DONSON in the STRING data- 
base (47), CMG assembly factors were among 
the top 21 most confident interactors (table S2, 
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sheet 4). These results suggest that in silico 
screening has potential as a general, ab initio 
approach to identify relevant interactors and 
thereby elucidate protein function. 


Discussion 


Our data support a model in which DONSON 
scaffolds formation of a large pre-LC that deli- 
vers GINS to origins for CMG assembly (fig. 
$20). The predicted docking of DONSON onto 
MCMsB places GINS close to its binding site on 
CMG, suggesting that DONSON functions not 
only as a pre-LC scaffold but also as a molec- 
ular matchmaker. Mutations designed to dis- 
rupt the DONSON-MCMs3 interaction impaired 
DONSON binding to chromatin, CMG assem- 
bly, and DNA replication. However, the pre-LC 
did not co-IP with MCM2-7 in nonreplicating 
extract, suggesting that the MCM3-DONSON 
interaction is context-dependent. Thus, we 
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speculate that the pre-LC is first recruited 
to MCM double hexamers through phospho- 
dependent binding of TOPBP1’s BRCTO-2 re- 
peats to TRESLIN (fig. S20A), followed by 
DONSON docking onto MCM3 (fig. S20, B and 
C, pink and blue circles). Because we have not 
been able to detect a direct physical interac- 
tion between DONSON and MCM3, the con- 
clusion that the pre-LC docks onto MCM3 
remains tentative. DONSON depletion disrup- 
ted not only GINS but also CDC45 recruitment 
to origins whereas TRESLIN and MTBP recruit- 
ment were unaffected. Given that CDC45 is 
known to bind MCMs weakly in the absence 
of GINS (5, 6), we favor the idea that CDC45 
associates with origins normally in the ab- 
sence of DONSON but dissociates during chro- 
matin isolation due to the lack of full CMG 
assembly. We showed that CDC45 and GINS 
recruitment to chromatin were also defective 
in DONSON-deficient mammalian cells. To- 
gether with structure predictions in different 
organisms (table S1), evidence for DONSON- 
dependent GINS and CDC45 recruitment to 
chromatin in nuclear assembly egg extracts 
(42, 43), and data from worms (44), our evi- 
dence suggests that DONSON is generally 
required for CMG assembly in metazoans. 

Our results also shed light on the role of Pol 
e in replication initiation. A DONSON mutant 
(N67A) that disrupts Pol « retention in the pre- 
LC has little effect on CMG assembly or repli- 
cation efficiency, consistent with previous results 
that Pol e is not required for CMG formation 
(19, 20). Notably, TOPBP1’s BRCT4-5 domain 
binds GINS on the same site that is occupied 
by POLE2 in the replisome (36) (fig. S9D), and 
Pol « binds DONSON independently of GINS 
and TOPBPI, as shown by the DONSON‘®“ 
mutant. Therefore, we propose that in the pre- 
LC, TOPBP1 uses its GINI and BRCT4-5 do- 
mains to occupy GINS, whereas Pol ¢ is flexibly 
attached, binding primarily to DONSON’s 
N-terminal disordered region (fig. S20A); 
after pre-LC docking and CMG assembly, Pol € 
binds cooperatively to GINS and MCM2-7, dis- 
placing TOPBP1’s BRCT4-5 domains from 
GINS, which causes TOPBP1 dissociation from 
CMG (fig. S20, C and D). 

In yeast, CDK-phosphorylation of Sld2 pro- 
motes Sld2 binding to Dpb11, which underlies 
pre-LC assembly. However, the closest verte- 
brate Sld2 homolog, RECQL4, functions down- 
stream of CMG assembly (17, 18). Although 
DONSON and Sld2 share no sequence or 
structural homology and DONSON’s interac- 
tion with TOPBP1/Dpb11 does not appear to 
be regulated by phosphorylation, we propose 
that DONSON has replaced the function of 
Sld2 in vertebrate pre-LC assembly. DONSON 
is predicted to interact with Cyclin A and 
Cyclin E (table S1 and fig. S21), and DONSON 
depletion partially codepletes CDK2-Cyclin E, 
raising the possibility of a functional interplay 
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between DONSON and CDKs that does not 
involve TOPBPI1. Our evidence suggests a uni- 
fied model in which CMG assembly involves a 
pre-LC in both yeast and metazoa but utilizes 
different architectures. 

DONSON forms a dimer (Fig. 2F, fig. S3F, 
and fig. S6), and dimerization does not clash 
with DONSON binding to other pre-LC com- 
ponents or MCM3 (figs. S4 and S5), suggesting 
that a dimeric pre-LC might dock onto MCM 
double hexamers (fig. S20A). However, given 
its predicted dimensions, the two MCM3 
binding helices of the DONSON dimer would 
not be able to contact both MCM3s at the 
same time (fig. S20B, pink and black circles, 
and data S2). We speculate that disengage- 
ment and clockwise rotation of the two MCMs, 
as seen during CMG assembly in yeast (45), 
might enable DONSON dimer binding to both 
MCM3 molecules simultaneously (fig. S20C; 
pink and blue circles). Whether such a se- 
quential CMG assembly mechanism occurs 
and whether it leads to concerted activation 
of sister replisomes is an important question 
for future studies. 

Previous studies using siRNA and patient- 
derived cell lines implicated DONSON in ATR 
signaling, replication elongation, fork protection, 
and interstrand crosslink traversal (23, 26). In 
agreement, using degron-allele and isogenic 
mutant cell lines, we found that DONSON not 
only promotes CMG assembly but also affects 
downstream DNA replication events, including 
fork progression, fork stability, and checkpoint 
signaling. The interplay of these various func- 
tions is likely to be complex. Deficient check- 
point signaling may reflect a direct involvement 
of DONSON in ATR activation but could also 
be an indirect consequence of reduced origin 
firing. However, defective fork progression 
cannot be accounted for by DONSON’s role 
in replication initiation because reduced origin 
firing generally leads to faster fork rates as 
replication resources become more abundant 
(46). Indeed, knock-down of MTBP or TRESLIN, 
which act upstream of DONSON, reduces origin 
firing and increases fork rates, but unlike 
DONSON, does not cause fork asymmetry 
(11, 12, #7). Thus, DONSON’s phenotypes are 
consistent with a dual role in initiation and 
elongation. The mechanism by which DONSON 
acts downstream of replication initiation is 
unclear, especially given recent structural data 
(48), which indicates that Pol a binding to 
CMG would be incompatible with DONSON’s 
predicted interaction with CMG. One possibility 
is that if Pol a dissociates from CMG, DONSON 
binds, preventing GINS dissociation and/or 
regulating the replication stress response di- 
rectly, similar to TOPBP1. 

Meier-Gorlin syndrome (MGS), defined by 
growth restriction, microtia, and patella agene- 
sis (49), is a form of microcephalic dwarfism 
specifically associated with genes encoding 
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replication initiation factors (22). DONSON’s 
role in CMG assembly provides a mechanistic 
explanation for DONSON mutations discovered 
in MGS patients (50, 57). As replication licensing 
defects reduce cell proliferation during develop- 
ment (52), impaired CMG assembly could also 
limit embryonic cell divisions. This would re- 
duce total cell number, resulting in the hypo- 
cellularity that underlies microcephalic dwarfism 
(21). Cell cycle progression is expected to be 
further impaired by slow fork progression ob- 
served when DONSON function is compromised. 
DONSON mutations are also associated with 
other microcephalic dwarfism disorders (micro- 
cephaly, short stature, and limb abnormalities, 
and microcephaly-micromelia syndrome), where 
brain size is disproportionately affected, and 
limb reduction abnormalities are evident (23, 24). 
These conditions may represent more severe 
forms of the same phenotypic spectrum; alter- 
natively, they might reflect the disruption of 
other DONSON functions (23, 26). 

Our results illustrate the power of in silico 
protein-protein interaction (PPI) screening. 
In a focused screen of DNA replication factors, 
AF-M clearly identified CMG assembly as the 
most likely DONSON function. Even when 
DONSON was screened against the entire 
human proteome, CMG assembly emerged as 
a probable DONSON function, especially when 
select hits were subjected to a second round 
of predictions guided by a knowledge of their 
quaternary structure. A much less computa- 
tionally intensive approach that also identified 
DONSON’s functional partners involved AF-M 
screening of the ~500 proteins associated with 
DONSON in the STRING database. More pro- 
teins will have to be analyzed to develop robust 
and general strategies that successfully leverage 
structure prediction for ab initio discovery of 
protein function. Nevertheless, our results 
illustrate that, when combined with careful 
experimental validation, in silico PPI screen- 
ing has great potential to accelerate mecha- 
nistic discovery. 


Materials and Methods 
Xenopus egg extracts and in vitro DNA 
replication 


Experiments involving adult female (Nasco 
Cat #LM0053MX) Xenopus laevis performed 
at Harvard Medical School were approved by 
the Harvard Medical Area Standing Committee 
on Animals (HMA IACUC Study ID IS00000051- 
6, approved 10/23/2020, and ISO0000051-9, 
pending approval). The institution has an ap- 
proved Animal Welfare Assurance (D16-00270) 
from the NIH Office of Laboratory Animal 
Welfare. 

Xenopus egg extracts were prepared as de- 
scribed previously (53). To execute in vitro 
DNA replication, the high-speed supernatant 
(HSS) of total egg lysate was first incubated 
with 15 ng of pBlueScript I per uL of HSS (30 ng 
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to study binding of DONSON to chromatin, Figs. 
1F and 48, and fig. $17) for 30 mins at room 
temperature to promote replication licensing. 
Optionally, to inhibit licensing, HSS was sup- 
plemented with 0.4 uM of recombinant His- 
Geminin and incubated for 10 mins at room 
temperature before addition of plasmids. Repli- 
cation was then initiated by adding two volumes 
of nucleoplasmic extract (NPE) supplemented 
with 1.93 mM DTT, 1.8 mM ATP, 18 mM phos- 
phocreatine, and 4.5 ug/mL creatine phospho- 
kinase. Where indicated, NPE was supplemented 
with 50 ug/mL recombinant GST-p27” (“CDKi”) 
or 50 uM PHA-767491 (Sigma-Aldrich PZ0178, 
“DDKi”) and pre-incubated for 15 mins at 
room temperature before addition to HSS to 
inhibit CMG assembly. 


Analysis of total in vitro DNA synthesis 


To monitor overall DNA synthesis, in vitro DNA 
replication reactions were supplemented 
with 0.16 uCi/uL of [a-*°P]dATP (Perkin Elmer 
BLU512H500UC). At the indicated times after 
initiating replication by NPE addition, sam- 
ples of the replication reactions were quenched 
in 5 volumes of replication stop buffer (80 mM 
Tris-HCl pH 8.0, 8 mM EDTA, 0.13% phos- 
phoric acid, 10% Ficoll 400, 5% SDS, 0.2% 
bromophenol blue) supplemented with 20 ug 
of proteinase K (Roche 3115879001). The sam- 
ples were incubated at 37°C for an hour to 
digest all proteins. 

The samples were then separated by native 
agarose gel electrophoresis, using 0.9% agarose 
gels and 1x TBE buffer (89 mM Tris, 89 mM 
Boric acid, 2 mM EDTA pH 8.0). The gels 
were then surrounded by a positively charged 
membrane (GE/Cytiva Hybond-XL or Roche 
11417240001) to prevent loss of nucleic acids, 
and dried. The dried gels were exposed to 
phosphor screens and imaged on the Typhoon 
FLA 700 PhosphorImager (GE Healthcare). 
Total DNA synthesis was determined by quan- 
tifying the total intensity of each lane using 
ImageJ. 


Expression and purification of recombinant 
Xenopus DONSON and Xenopus TOPBP1!°°° 


Untagged DONSON, FLAG-DONSON, and 
HA-TOPBP1!°° were cloned into pGEX-6P1 
vectors with sequences encoding a GST-tag and 
3C protease cleavage site on the N terminus. 
Indicated mutations were introduced using a Q5 
Site-Directed Mutagenesis Kit (NEB #E0554S) 
and primers described in table S3. The vectors 
were then transformed into Rosetta (DE3) pLysS 
cells. For each purification, 1L of LB media was 
inoculated with the respective strain and grown 
to exponential phase at 37°C (ODg¢o0 ~0.4 to 0.8). 
Protein expression was induced with 1 mM 
IPTG for 16 to 18 hours at 16°C. 

The cells were then harvested and resus- 
pended in lysis buffer (50 mM HEPES-KOH 
pH 7.7, 500 mM NaCl, 5% glycerol, 5 mM DTT) 


Lim et al., Science 881, eadi3448 (2023) 


supplemented with 1x cOmplete EDTA-free pro- 
tease inhibitor cocktail (Roche 5056489001) and 
200 pg/mL lysozyme. Cells were lysed by soni- 
cation and cleared by centrifugation in a Ti-45 
rotor (Beckman Coulter) at 30,000 rpm for 
1 hour at 4°C. The supernatant was collected, 
filtered using a 0.45-um filter (Merck-Millipore 
SLHVR33RS) and incubated with 2 mL of 
Glutathione Sepharose 4B resin (GE/Cytiva 
17075605) for 1 hour at 4°C. The resin was 
washed with 40 column volumes of lysis buffer, 
followed by 40 column volumes of wash buffer 
(50 mM HEPES-KOH pH 7.7, 150 mM NaCl, 5% 
glycerol, 5 mM DTT). The GST tag was cleaved 
by incubating the resin overnight at 4°C with 
800 ug of PreScission Protease (fusion of GST 
and 3C protease). The flowthrough was col- 
lected, concentrated using an Amicon Ultra 
10,000 MWCO centrifugal filter device (Milli- 
pore), cleared by centrifugation at 10,000 x g for 
10 min and subjected to size exclusion chro- 
matography using a Superdex 200 Increase 
10/300 GL column (GE Healthcare) and buffer 
containing 50 mM HEPES-KOH pH 7.7, 300 mM 
NaCl, 5% glycerol, 5 mM DTT. The appropriate 
fractions were collected and pooled, then con- 
centrated using an Amicon Ultra 10,000 MWCO 
centrifugal filter device (Millipore). The con- 
centrations of the purified proteins were quan- 
tified using at least three measurements on a 
NanoDrop One® (ThermoFisher Scientific). 
Finally, the purified proteins were aliquoted, 
snap frozen in liquid No, and stored at —80°C. 


Purification of recombinant Xenopus GINS 


Recombinant GINS used in fig. SIIB was the 
same preparation used and described previ- 
ously (54). 

Recombinant GINS used in fig. S13 was 
generated using the Acembl/MultiCol system 
(55) by first cloning the PSF1, PSF2, and SLD5 
subunits of Xenopus laevis GINS into a pDC 
donor plasmid and PSF3 (with a C-terminal 
His6 tag connected through a LPETG tag and 
10-aa linker) into a pACE2 acceptor plasmid. 
Cre-recombination was then used to assemble 
all subunit of GINS into a single expression 
plasmid. The H76A mutation in SLD5 was in- 
troduced into the pDC donor plasmid prior to 
Cre-recombination using a Q5 Site-Directed 
Mutagenesis Kit (NEB #E0554S) and primers 
described in table S3. 

The vectors were then transformed into 
Rosetta (DE3) pLysS cells. For each purification, 
1L of LB media was inoculated with the res- 
pective strain and grown to exponential phase 
at 37°C (OD¢o0 ~0.4 to 0.8). Protein expression 
was induced with 1 mM IPTG for 4 hours at 
30°C. Cells were then harvested and resus- 
pended in lysis buffer (20 mM Tris-HCl pH 8.0, 
500 mM NaCl, 20 mM Imidazole, 1 mM PMSF, 
1mM DTT, 5% glycerol) supplemented with 
2 x cOmplete EDTA-free protease inhibitor 
cocktail (Roche 5056489001) and 2 mg/mL 
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lysozyme. Cells were lysed by sonication and 
cleared by centrifugation in a Ti-45 rotor 
(Beckman Coulter) at 30,000 rpm for 1 hour 
at 4°C. The supernatant was collected, fil- 
tered using a 0.45 um filter (Merck-Millipore 
SLHVR33RS) and incubated with 1 mL of Ni-NTA 
Superflow resin (Qiagen) for 1 hour at 4°C. The 
resin was washed with 100 column volumes of 
lysis buffer followed by 5 column volumes of 
elution buffer (20 mM Tris-HCl pH 8.0, 500 mM 
NaCl, 250 mM Imidazole, 1 mM PMSF, 1mM 
DTT, 5% glycerol). The eluate was diluted to 
150 mM NaCl using MonoQ buffer (20 mM 
Tris-HCl pH 7.5, 1 mM DTT, 5% glycerol) and 
subjected to anion exchange chromatography 
using a MonoQ 5/50 GL column (Cytiva) with 
a 150 to 700 mM NaCl gradient in MonoQ 
buffer. GINS eluted at 350 mM NaCl. The ap- 
propriate fractions were collected and pooled, 
then de-salted to 150 mM NaCl using a PD10 
de-salting column and concentrated using an 
Amicon Ultra 10,000 MWCO centrifugal filter 
device (Millipore). The concentrations of re- 
combinant GINS from bacterial expression 
were determined by running aliquots alongside 
a titration of recombinant GINS from Sf9 ex- 
pression (54) on a 4 to 15% Mini-PROTEAN TGX 
Precast Protein Gel stained using InstantBlue 
stain (Novus ISB1L). Total lane intensity was 
quantified using ImageJ. The purified proteins 
were aliquoted, snap frozen in liquid N» and 
stored at —80°C. 


Expression of proteins in wheat germ protein 
expression system 


FLAG-DONSON and HA-TOPBP!1 (various con- 
structs) were cloned into pF3A WG (BYDV) 
Flexi vectors. Indicated mutations were intro- 
duced using a Q5 Site-Directed Mutagenesis 
Kit (NEB #E0554S) and primers described in 
table S3. Plasmids were maintained in DH5o 
cells and purified using QIAprep Spin Miniprep 
Kits (Qiagen). The proteins were expressed in 
the TnT® SP6 High-Yield Wheat Germ Protein 
Expression System (Promega) by mixing 3 
volumes of the extract with 2 volumes of 100 ng/uL 
purified plasmid and incubating at 25°C for 
2 hours. Extracts containing expressed proteins 
were used immediately. 


Immunodepletions and rescue experiments 


For immunodepletion of endogenous DONSON, 
we raised a rabbit polyclonal antibody against 
a peptide comprising amino acids 11 to 23 of 
Xenopus DONSON (Biosynth project #4616). 
0.3 volumes of the 1 mg/mL antibody were in- 
cubated with 1 volume of Dynabeads Protein A 
(nvitrogen 10002D) by gently rotating at 4°C 
overnight. 1.5 volumes of extract was immuno- 
depleted by three rounds of incubation with 
1 volume of antibody-charged Dynabeads, by 
gently rotating at 4°C for 1 hour per round. 
For immunodepletion of endogenous TOPBP1, 
we raised a rabbit polyclonal antibody against 
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a peptide comprising amino acids 498 to 510 
of Xenopus TOPBP1 (Biosynth project #5620). 
3 volumes of the 1 mg/mL antibody was incu- 
bated with 1 volume of Protein A Sepharose 
Fast Flow antibody purification resin (GE/ 
Cytiva #17127903) by gently rotating at 4°C 
overnight. 5 volumes of extract was immu- 
nodepleted by three rounds of incubation 
with 1 volume of antibody-charged Protein A 
Sepharose beads, by gently rotating at 4°C 
for 1 hour per round. 

For immunodepletion of endogenous GINS, 
anti-GINS antibodies (Pocono #34300), affinity- 
purified as previously described (54), were used. 
5 volumes of the 1 mg/mL antibody was incu- 
bated with 1 volume Protein A Sepharose Fast 
Flow antibody purification resin (GE/Cytiva 
#17127903) by gently rotating at 4°C overnight. 
5 volumes of NPE was immunodepleted by 
three rounds of incubation with 1 volume of 
antibody charged Protein A Sepharose beads, 
by gently rotating at 4°C for 1 hour per round. 
5 volumes of HSS was immunodepleted by two 
rounds of incubation under the same conditions. 

In DONSON rescue experiments (all except 
fig. S1, B and C), NPE was supplemented with 
0.3 uM final concentration of recombinant 
human CDK2-Cyclin El (ProQinase #0050- 
0055-1) and incubated for 15 min at room 
temperature before initiating replication, to 
compensate for the codepletion of endogenous 
CDK2-Cyclin E during the immunodepletion of 
endogenous DONSON. Recombinant DONSON 
(WT and mutants) was added at a final con- 
centration of 150 nM in NPE and incubated for 
15 min at room temperature before initiating 
replication. 

In GINS rescue experiments (fig. S13) re- 
combinant GINS was added at a final concen- 
tration of either 180 nM or 270 nM in NPE and 
incubated for 15 min at room temperature before 
initiating replication. 

For rescues using proteins expressed in 
wheat germ extract (figs. S12, $14, and S15), 
1 volume of the appropriate wheat germ ex- 
tract was added to 4 volumes of NPE and 
incubated for 15 min at room temperature 
before initiating replication. 


SDS-PAGE and immunoblotting of samples from 
Xenopus egg extract experiments 


All samples to be analyzed were boiled in 
Laemmli buffer (50 mM Tris-HCl pH 6.8, 2% 
SDS, 10% glycerol, 0.1% bromophenol blue, 
5% {-mercaptoethanol). Unless stated other- 
wise, samples were run in 4 to 15% Mini- 
PROTEAN TGX Precast Protein Gels (Bio-Rad 
#4561086) or 4 to 15% Criterion TGX Precast 
Midi Protein Gels (Bio-Rad #5671085) using 
Tris-Glycine-SDS Running Buffer (25 mM 
Tris-HCl pH 8.3, 192 mM glycine, 0.1% SDS). The 
samples were run alongside EZ-Run Prestained 
Rec Protein Ladder (Fisher BioReagents 
#BP36031) to infer the size of the protein bands. 
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For Coomassie staining, gels were stained 
using InstantBlue stain (Novus ISB1L) for at 
least 1 hour at room temperature. For immuno- 
blotting, gels were transferred to PVDF mem- 
branes (Thermo Scientific #88518) in transfer 
buffer (25 mM Tris pH 8.5, 192 mM glycine, 
20% methanol). The membranes were blocked 
in 1 x PBST containing 5% (w/v) nonfat milk 
for 30 min at room temperature with gentle 
shaking, and incubated with primary anti- 
bodies diluted in 1x PBST containing 1% (w/v) 
BSA overnight at 4°C with gentle shaking. 
Membranes were then washed extensively 
with 1 x PBST and incubated with secondary 
antibodies diluted in 1x PBST containing 5% 
(w/v) nonfat milk for 1 hour at room tem- 
perature with gentle shaking. Membranes were 
washed again extensively with 1x PBST, devel- 
oped using ProSignal Pico ECL Spray (Prome- 
theus Protein Biology Products #20-300S) or 
SuperSignal West Dura extended duration 
substrate (Thermo Scientific 34075) and 
imaged using an Amersham Imager 600 (GE 
Healthcare). 

Rabbit polyclonal antibodies against the fol- 
lowing proteins were used as primary anti- 
bodies for Western blotting: 

DONSON (1:5000, described above) 

MCM7 [1:12,000, (3))] 

MCM4 (1:4000, Bethyl #A300-193A, RRID: 
AB_162720) 

MCM3 (1:4000, Santa Cruz (H-215) #sc-292857) 

CDC45 [1:20,000, (56)] 

GINS [1:5000, (54)] 

POLEcat [1:5,000, (57)] 

POLAI (1:5000, Pocono #35956 raised against 
the N-terminal 340aa fragment of Xenopus 
laevis POLAI, used in Figs. 1E and 4E) 

PCNA [1:5,000, (58)] 

TOPBP1****"° (1:5000, described above and 
used in fig. S15) 

FLAG [1:5000, (57)] 

HA [1:1000, Cell Signaling (C29F3) #3724, 
RRID: AB_1549585] 

Cyclin E [1:5000, (27)] 

CDK2 [1:5000, (28)] 

Histone H3 (1:500, Cell Signaling #9715, 
RRID: AB_ 331563). 

Mouse monoclonal antibodies against the 
following proteins were used as primary anti- 
bodies for Western blotting: 

GST (1:3000, Cell Signaling #2624, RRID: 
AB_2189875) 

Rabbit polyclonal antibodies against TRE- 
SLIN [1:1000, (8)], MTBP [1:500, (12)], and 
RECQL4 [1:1000, (77)] were generous gifts 
from William Dunphy (California Institute of 
Technology, USA). 

Rabbit polyclonal antibodies against TOPBP1 
[1:2500, (73), used in all TOPBP1 blots except in 
fig. S15] and POLAI [1:5000, (59), used in all 
POLAI blots except in Figs. 1E and 4E] were 
generous gifts from Matthew Michael (Uni- 
versity of Southern California, USA). 
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A rabbit polyclonal antibody against POLE2 
(1:7500, (19)] was a generous gift from Shou 
Waga (Japan Women's University, Japan). 

The following secondary antibodies were 
used: 

Goat anti-rabbit horseradish peroxidase- 
conjugated (Jackson ImmunoResearch, 111- 
035-003, RRID: AB_2313567) at 1:10,000- 
1:30,000 dilution. 

Light chain specific mouse anti-rabbit horse- 
radish peroxidase-conjugated (Jackson Immu- 
noResearch, 211-032-171, RRID: AB_2339149) at 
1:5000 dilution. 

Rabbit anti-mouse horseradish peroxidase- 
conjugated (Jackson ImmunoResearch, 315- 
035-003, AB_2340061) at 1:2000 dilution. 


Plasmid pull-down (chromatin pull-down) 


Plasmid pull-downs were performed essen- 
tially as described (60). Briefly, 1 volume of | 
streptavidin-coated magnetic beads (Dynabeads 
M-280, Invitrogen 11206D) was incubated with 
6 volumes of binding buffer (50 mM Tris-HCl 
pH 7.5, 150 mM NaCl, 1 mM EDTA, 0.02% 
Tween 20) containing 0.2 uM of biotinylated re- 
combinant LacI for 40 min at room tempera- 
ture. The beads were then washed thrice with 
stop buffer (20 mM HEPES-KOH pH 7.7, 100 mM 
KCl, 5 mM MgCly, 0.5 M sucrose, 0.25 mg/mL 
BSA, 0.03% Tween 20), and resuspended in 
5 volumes of the same buffer. The washed 
beads were then aliquoted and chilled on ice. 

At indicated times after initiating replication 
by NPE addition, samples of replication reac- 
tions were added to the bead aliquots at a 1-to- 
10 ratio and gently rotated for 30 min at 4°C. 
The beads were then washed thrice with wash 
buffer (20 mM HEPES-KOH pH 7.7, 100 mM 
KCl, 5 mM MgCl, 0.25 mg/mL BSA, 0.03% 
Tween 20). Bound proteins were eluted by 
boiling with 1x Laemmli buffer and sub- 
jected to analysis by SDS-PAGE and immuno- 
blotting. For all figures except Figs. 1F, 4E, and 
fig. S17, bound proteins eluted from 6 ng of 
plasmids were loaded in each well. For Figs. 
1F, 4E, and fig. S17, bound proteins eluted 
from 36 ng of plasmids were loaded in each 
well. To infer the efficiency of the plasmid pull- 
downs, an equivalent of 5% of the replication 
reaction subjected to plasmid pull-down (“input”) 
was loaded on the gels alongside the plasmid 
pull-down samples. 


Immunoprecipitation 


For anti-FLAG immunoprecipitations, Anti- 
FLAG M2 Magnetic Beads (Millipore M8823) 
were washed thrice in IP wash buffer 10 mM 
HEPES-KOH pH 7.7, 50 mM KCl, 25 mM 
Mg(Cly, 250 mM sucrose, 0.1 mg/mL BSA, 0.02% 
Tween 20) and used in aliquots containing 2 
uL of packed beads. The beads were optionally 
pre-immobilized with proteins expressed in 
wheat germ extract (Fig. 4D and figs. S12 and 
$14) by incubating each aliquot of beads with 
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20 uL of wheat germ extract expressing the 
desired FLAG-tagged protein for 1 hour at 4°C 
with gentle rotation. Each aliquot of beads was 
incubated with 15 uL of 30% NPE (diluted in 
IP wash buffer) containing 1 uM of recombinant 
FLAG-DONSON (omitted if beads were pre- 
immobilized with protein expressed in wheat 
germ extract), for 1 hour at 4°C with gentle 
rotation. The beads were then washed thrice 
with cold IP wash buffer. To elute bound 
proteins, each aliquot of beads was incubated 
with 15 wL of IP wash buffer containing 1 mg/ 
mL of 3x FLAG peptide (Sigma-Aldrich F4799) 
for 1 hour at room temperature with gentle 
rotation. 

For inmunoprecipitation of mixed purified 
proteins (figs. S11B and S13C), each purified 
protein was added at 0.5 uM to IP wash buffer 
and incubated at room temperature for 15 min. 
These mixtures were incubated with the beads 
instead of NPE, and for 30 min at room tem- 
perature instead of 1 hour at 4°C. Otherwise, 
the immunoprecipitation was performed es- 
sentially as described above. 

For anti-HA immunoprecipitations (figs. S10 
and S15), Anti-HA Magnetic Beads (Pierce 
88836) were washed thrice in IP wash buffer 
and used in aliquots containing 0.1 mg of 
beads (10 uL of bead slurry). The beads were 
pre-immobilized with HA-TOPBP1 proteins 
expressed in wheat germ extract by incubating 
each aliquot of beads with 18 uL of wheat 
germ extract expressing the desired protein 
for 1 hour at 4°C with gentle rotation. Each 
aliquot of beads was then incubated with 15 nL 
of 30% NPE (diluted in IP wash buffer) for 
1hour at 4°C with gentle rotation. The beads 
were then washed thrice with cold IP wash 
buffer. Bound proteins were eluted by boiling 
each aliquot of beads with 30 uL of 1x Laemmli 
buffer. 

For immunoprecipitation of endogenous PSF3 
(Fig. 3B), 0.3 volumes of 1 mg/mL anti-PSF3 
antibody (Bethyl 61582A) was incubated with 
1 volume of Dynabeads Protein A (Invitrogen 
10002D) by gently rotating at 4°C overnight. 
The antibody was crosslinked to the beads 
using dimethyl pimelimidate (DMP) (Thermo 
Scientific 21666), then washed thrice in IP 
wash buffer and used in aliquots containing 
0.3 mg of beads (10 uL of bead slurry). Each 
aliquot of beads was incubated with 15 uL of 
30% NPE (diluted in IP wash buffer) for 1 hour at 
4°C with gentle rotation. The beads were then 
washed thrice with cold IP wash buffer. Bound 
proteins were eluted by boiling each aliquot of 
beads with 30 uL of 1x Laemmli buffer. 

6 uL of each eluate sample was loaded on 
each gel and analyzed by immunoblotting. To 
infer the efficiency of the immunoprecipita- 
tions, an equivalent amount of extract as that 
subjected to immunoprecipitation (“input”) 
was loaded on the gels alongside the eluate 
samples. In Fig. 3E, extracts were supplemented 
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with 50 ug/mL recombinant GST-p27“” (“CDKi”) 
or 20 U/uL Lambda protein phosphatase (New 
England BioLabs P0753) and treated for 30 mins 
at room temperature prior to immunoprecipitation. 


Mass photometry 


Wild-type recombinant FLAG-DONSON was 
analyzed on a Refeyn TwoMP mass photom- 
eter at the Harvard Medical School Center for 
Macromolecular Interactions. The mass photo- 
meter was calibrated with a protein calibration 
mix containing 10 nM f-amylase (Sigma Aldrich 
A8781) and 3 nM Thyroglobulin (Sigma-Aldrich 
609310) prior to taking measurements (concen- 
trations listed were the final concentrations in 
droplet). Recombinant wild-type FLAG-DON- 
SON was diluted to 200 nM in egg lysis buffer 
(10 mM HEPES-KOH pH 7.7, 50 mM KCl, 
2.5 mM MgCl, 250 mM sucrose). For each mea- 
surement, the objective was focused using an 
18 uL droplet of PBS, 2 uL of 200 nM FLAG- 
DONSON was mixed into the droplet, and sam- 
ple data was collected immediately. Figures and 
Gaussian fits were generated using the Refeyn 
DiscoverMP software. 


AlphaFold2-multimer (AF-M) screen 


To discover novel DONSON interactors within 
DNA replication pathways, we performed an 
in silico screen using the AF-M program de- 
veloped by DeepMind (35, 61). We folded 
DONSON homologs pairwise against core cor- 
responding replisome proteins from Homo 
sapiens, Xenopus laevis, Drosophila melano- 
gaster, and Caenorhabditis elegans. See table 
S1 for proteins examined in each organism. 

In all cases, we ran all 5 of the AF-M models 
for 3 recycles with version 3 weights, templates 
enabled, and no dropout. These runs were 
performed using a local installation of Colabfold 
v1.5 (62) running on a Linux server equipped 
with 40GB NVIDIA A100 GPUs. Multiple se- 
quence alignments (MSAs) and template inputs 
to the AlphaFold network were generated 
within the Colabfold pipeline which routes 
protein sequences to another server running 
the MSA software MMseqs2 (63). All predic- 
tions were generated using a combination of 
the paired and unpaired MSAs supplied by 
MMsegs2. Except in one case (fig. S7), pre- 
dicted structures were not relaxed. 

To analyze the predictions produced by 
AlphaFold 2, we also established a separate 
analysis pipeline written in python. The analy- 
sis pipeline integrates spatial information 
about residues as well as confidence and accu- 
racy metrics predicted by AlphaFold 2. The 
analysis iterates through each residue in a 
protein chain and analyzes its position and 
confidence relative to residues in other pro- 
tein chains to find contacts. We defined a 
contact as a unique pair of residues that have an 
average predicted local distance difference 
threshold (pLDDT) > 50, a minimum predicted 
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Alignment Error (pAE) < 15 angstroms, and 2 
non-hydrogen atoms closer than 8 angstroms. 
For each prediction we defined an interface as 
the set of all “contacts” (residue pairs) between 
2 amino acid chains. We generated a series of 
interface statistics such as average pAE and 
average pLDDT, by averaging the individual 
values of these statistics across all contacts. 
We additionally calculated the predicted DOCKQ. 
(pDOCKQ) value for all predictions as an estimate 
of the interface accuracy with a score ranging 
from 0 (worst) to 1 (best) (64). 

Once a list of contacts had been identified in 
each prediction, these residue pairs/contacts 
were compared across all predictions genera- 
ted for a particular complex. This comparison 
then allowed us to calculate aggregate metrics 
that quantify how well each AlphaFold model’s 
predictions agree on an interface. The two 
primary metrics we calculated were the “average 
n models” and “max n models”. The “average n 
models” statistic represents the average number 
of models that predict each contact and is 
calculated by finding all unique contacts across 
all predictions, counting how many models 
predicted each of these contacts, and then 
averaging the result across all the unique con- 
tacts. This procedure also let us calculate the 
“max n models” which is the maximum num- 
ber of models that predict a specific contact. In 
both cases, the numbers are bounded between 
1 and the number of models run with higher 
values indicating higher levels of agreement 
between models/predictions. Because we always 
ran five models, these metrics range from 1 to 5. 
The data for our DONSON screen across the 
replisome proteins of 4 species is presented 
as Table S1. The code we used to analyze the 
predictions is available on Zenodo (65). 

To perform a human proteome wide screen 
for potential DONSON interactors, we down- 
loaded all the canonical isoform sequences for 
20,424 Swiss-Prot reviewed human proteins 
from the UNIPROT web portal on 8 July, 


2023. We removed any redundant amino acid - 


sequences and any proteins that were shorter 
than 15 residues or longer than 3034 residues 
(to prevent GPU memory exhaustion). This 
left a set of 20,190 proteins representing 98.8% 
of the known human proteome. We ran all 
these proteins paired with DONSON for 3 
recycles with version 3 weights, templates 
enabled, and no dropout using models 1, 2, 
and 4 through the aforementioned Colabfold 
pipeline. The results were analyzed through 
the same python analysis script and are sum- 
marized in table S2 sheet 2, where each row 
represents 1 pair that was folded. These results 
were sorted by avg_n_models (desc), pdockq 
(desc), avg_interface_pae (asc) and assigned 
a rank. Because we ran 3 models/3 repeats 
for this set, the maximum/best achievable 
avg_n_models and max_n_models for pairs 
was 3. 
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To identify proteins previously associated 
with human DONSON, we queried the STRING 
database through its REST API on 19 July, 2023. 
We set no minimum limits for required scores 
and retrieved all available entries. This re- 
sulted in a list of 536 proteins that we mapped 
to 507 proteins from our proteome-wide screen. 
The results of this mapping were tabulated in 
sheet 4 of table $2, where proteins with a 
DONSON STRING association were assigned 
a value of 1 for the in_STRING_db column. All 
other proteins were assigned a value of 0. We 
re-sorted the table based on STRING associa- 
tion (STRING DB followed by AF metrics as 
described above). This sorting protocol resulted 
in new rankings that are presented in Sheet 4. 


AlphaFold-based modeling of protein structures 
in ChimeraX 


Fig. S4A: The complex shown in fig. S4A was 
assembled stepwise, as follows. AF-M was 
used to first predict the structure of a Xenopus 
DONSON (“#1” in figure), GINS, and MCM3 
complex (PAE plots shown in fig. S4B), and the 
resulting PDB file was opened in ChimeraX. 
Disordered DONSON residues (1-6, 26-154, 
330-369) were deleted, leaving only the glo- 
bular domain and GINS-binding peptide (resi- 
dues 7-25). MCM3 residues 659-807 were also 
deleted. Separately, TOPBP1 and GINS were 
folded together using AF-M (PAE plots for the 
human prediction shown in fig. $9; predic- 
tion for Xenopus complex looks very similar, 
as shown in fig. S9C). All TOPBP1 residues 
except 475-492 (GINI peptide) and 538-734 
(BRCT4-5) were deleted, and the resulting 
TOPBPI1-GINS complex (GINS hidden) was 
aligned with the PSF1 subunit of the above 
DONSON-GINS-MCM3 complex. Separately, 
two copies of DONSON (residues 155-579) and 
1copy of the TOPBP1 BRCT3 domain (residues 
343-447) were folded with AF-M (PAE plot 
shown in fig. S4D) and aligned to the DONSON- 
GINS-MCMB complex, revealing the position 
of the second copy of DONSON (brick red), 
and demonstrating that the TOPBP1 BRCT3 
domain binds the DONSON dimer interface. 
Finally, DONSON was folded with POLE2 
(PAE plots shown in fig. S4C), all DONSON 
residues except those interacting with POLE2 
(65-72) were deleted. The POLE2-DONSON 
complex was not aligned with the rest of 
the complex but instead is shown separately in 
fig. S4A. 

Fig. S5A: Same as fig. S4A except all proteins 
were human, and the disordered residues 
deleted in DONSON were 1-6, 26-73, 83-154, 
325-352. MCM3 residues 659-808 were deleted. 
The BRCT3 domain of TOPBP1 comprises resi- 
dues 340-450. 

Fig. 2H: Top, same as fig. S4A (left side) but 
the second copy of DONSON and MCM3 were 
hidden. Bottom, the structure shown in the 
top panel was docked onto the human cryo- 
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EM replisome structure [PDB: 7PLO (66), all 
but MCM2-7 and CDC45 were hidden] using 
the common MCM38 subunit, and MCM3 from 
the AlphaFold structure was hidden. 

Fig. 21. Same as fig. S4A (right side) but the 
second copy of DONSON and MCM3 were 
hidden. 

Fig. 3D and 4C: Same as fig. S4A (right side) 
except that MCM3 and the second copy of 
DONSON were hidden. 


RMSD Calculation (fig. S7) 


The MCM3-DONSON-GINS complex predicted 
by AF-M was relaxed using AMBER (https:// 
ambermd.org/index.php), and hydrogen atoms 
were removed with Coot (67). The structure 
was opened in ChimeraX together with 
PDB:7PLO, and the RMSD was calculated for 
MCMB and the GINS complex between the 
two structures. 


Mammalian cell culture 


All cells were grown at 37°C and 5% COs, and 
shown to be mycoplasma negative through 
routine testing. HCT116 cells (human colonic 
cancer cell line) were grown in McCoy’s 5A 
Modified media (Gibco Cat. No. 26600023) 
supplemented with 10% FCS (Gibco Cat. No. 
10270-106; lot 2078421) 1 x penicillin and 2 mM 
L-Glutamine. MEFs (derivation, see generation 
of knock-in mice section), were grown at 3% 
O» in DMEM (Gibco Cat. 41965-039; lot 2340231) 
supplemented with 10% FCS (Gibco Cat. No. 
10270-106; lot 2078421), 1 X penicillin/streptomycin, 
0.1.M 2-mercaptoethanol (Gibco Cat. No. 31350- 
010; lot 2328476) mESCs (Mus musculus, 129/ 
Ola E14 parental cell line) were maintained in 
serum/LIF media containing G-MEM BHK-21 
(Invitrogen Cat. No. 21710-025), 10% FCS (Gibco 
Cat. No. 10270-106; lot 2078421), 1X sodium 
pyruvate (Sigma Cat. No. $8636), 1X MEM 
Nonessential Amino Acid Solution (Sigma Cat. 
N. M7145) 1 x penicillin and 2 mM L-Glutamine, 
1:500 serum-conditioned media containing LIF 
and 0.5 uM (-Mercaptoethanol (Gibco Cat. 
No. 21985-023), fed every day and split every 
two days. Where indicated, 1uM AZD6738 
(AdooQ Bioscience Cat. No. A15794), ATR in- 
hibitor (ATRi), and/or 2mM Hydroxyurea (HU) 
(Sigma Cat. No H8627) were added to culture 
media for the times specified in the relevant 
experiments. 


DONSON-AID2 cell line 


To establish the HCT116 DONSON-AID2 cells, 
a parental cell line expressing OsTIRI(F74G) 
was transfected with two CRISPR plasmids 
targeting the C-terminal coding region of the 
DONSON gene (targets: 5'-TTAGGCTTACTT- 
TGGTGTTC-3’, 5'-TTAGGCTTACTTTGATGTTC- 
3’) and a donor plasmid encoding mAID- 
mClover and a hygromycin resistant marker 
following a published protocol (38, 68). After 
selecting clones in the presence of hygromycin 
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(100 ng/mL), bi-allelic insertion was confirmed 
by genomic PCR. Subsequently, expression 
of DONSON-mAID-mClover protein was con- 
firmed by Western blotting. 


Cell synchronization of HCT116 cells 


Synchronization was performed as previously 
described (69). In brief, 1.0 to 1.5 x 10° cells 
were seeded into 6-well plates and grown for 
1 to 2 days until 50% confluent. G1 arrest was 
induced by 24 to 30 hours treatment with 
20 uM Lovastatin (Thermo Scientific Cat. 
No. 15590584). Gl-arrested cells were washed 
once with lovastatin-free medium and grown 
in medium containing 2mM DL-Mevalono- 
lactone (Sigma- Aldrich, M4667) with/out 
5 uM 5-phenyl-1H-indole-3-acetic acid (5-Ph-IAA; 
MedChemExpress Cat No. HY-134653). 


Immunoblotting in mammalian cell experiments 


Total cell extracts were prepared in urea lysis 
buffer containing 8 M urea, 50 mM Tris-HCl, 
pH 7.5, 150 mM £-mercaptoethanol, protease 
inhibitors and PhoSTOP (Roche Cat. No. 
04693132001 and 4906837001). Lysed samples 
were sonicated 7 x 30 sec ON/OFF cycles using 
a Bioruptor (Diagenode). Protein electropho- 
resis was performed using NuPAGE 10% or 4 
to 12% Bis-Tris mini protein gels (Invitrogen 
Cat. No. NP0336BOX, NP0301BOX) and MOPS 
running buffer (Cat. No. NPOOO1) at 80 to 130 V. 
Wet transfer of proteins to Immobilon-FL PVDF 
membrane (Millipore Cat. No. IPFLO0010) was 
performed at 100 V for 60 to 75 min at 4°C. 
After transfer, membranes were washed in 
methanol, air-dried and re-activated in meth- 
anol, washed in 1X Tris-buffered saline/0.2% 
Tween-20 (Sigma. Cat. No. P1379) (TBS-T), and 
blocked in TBS-T/2.5% BSA (Roche Cat. No. 
10735086001, lot 64758420) for 1 hour at 
room-temperature (RT). Blots were incubated 
overnight (O/N) in TBS-T/2.5% BSA containing 
primary antibody. After 4 x 5 min washes in 
TBS-T, blots were incubated with secondary 


antibodies (1:20,000 to 30,000) for 1 hour at - 


RT, washed 4 x 5 min in TBS-T and rinsed in 
TBS before acquisition using a LI-COR Odissey® 
CLx imager. ImageStudio software was used 
for quantification. 

The following primary antibodies were used 
for immunoblotting mammalian samples: 

Mouse anti-Histone H2B, clone 5HH2-2A8 
(Millipore Cat. No. 05-1352), lot 3836574, 
1:20,000. RRID:AB_10807688 

Mouse anti-Alpha tubulin (a-Tub), clone 
B-5-1-2 (Sigma Cat. No. T6074), lot 037M4804V, 
1:10,000. RRID:AB_477582 

Mouse anti GINS1 (Millipore Cat. No. 
MABE2033), lot Q3876194 1:1500 

Rabbit anti-GINS2/Psf2 (Atlas Cat. No. 
HPA057285, lot A113986): 1:2500-10,000. RRID: 
AB_2683398 

Rabbit anti-CDC45 (D7G6; CST Cat. No. 
118818), lot 1; 1:1,500. RRID:AB_2715569 
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Rabbit anti-pChk1-ser317 (CST Cat. No. 2344, 
1:2500). RRID:AB_ 331488 

Mouse anti-Chk1 (G-4; Santa Cruz Cat. No. 
sc-8408, 1:2000). RRID:AB_627257 

Rabbit anti-CDK2 (CST Cat. No 2546). RRID: 
AB_2276129 

Rabbit anti-Cyclin E1 (Proteintech Cat. No. 
11554-1-AP). RRID:AB_2071066 

Rabbit anti-hDONSON and mDonson (gener- 
ated in authors’ lab); 1:2000 and 1:750 respectively. 

The following secondary antibodies were 
used for immunoblotting mammalian samples: 

IRDye 680RD Goat anti-Rabbit IgG (H + L) 
Highly Cross-Adsorbed, 0.1 mg (925-68071). 
RRID:AB_2721181 

IRDye 800CW Goat anti-Mouse IgG (H + L) 
Highly Cross-Adsorbed, 0.1 mg (925-32210). 
RRID:AB_2687825 

IRDye 680RD Goat anti-Mouse IgG (H + L) 
Highly Cross-Adsorbed, 0.1 mg (925-68070). 
RRID:AB_2651128 

IRDye 800CW Goat anti-Rabbit IgG (H + L) 
Highly Cross-Adsorbed, 0.1 mg (925-32211). 
RRID:AB_2651127 


Generation of DONSON antibodies used for 
mammalian cell experiments 


A rabbit polyclonal antibody against human 
DONSON (hDONSON) was generated previously 
(23), raised against amino acid residues 1-125 of 
human DONSON, purified after expression in 
E. coli from the pGEX-6P-1 expression vector. 
Polyclonal antibody against mouse Donson was 
also raised in rabbits using full-length mouse 
Donson (mDonson), purified after expression 
in E. coli from a pET28a-His-SUMO expression 
vector. Antibodies were affinity-purified from 
rabbit sera (Eurogentech) and specificity was 
established using lysates from siRNA-transfected 
cells, patient cells and knock-in mESCs. 


Mammalian cell fractionation 


Soluble and chromatin samples prepared using 
CSK buffer (70) 10 mM PIPES, pH 6.8, 300 mM 
Sucrose, 100 mM NaCl, 15 mM MgCl, 0.5% 
Triton-X-100, 1 mM ATP (VWR Cat. N. R04411), 
1mM DTT and 0.2 mM PMSF plus 1X protease 
inhibitors and PhoSTOP). Harvested cells 
were washed in cold phosphate-buffered saline 
(PBS), and resuspended in 500 pl CSK for 10 min 
on ice, then centrifuged at 1600 g for 6 min, with 
supernatant collected as soluble fraction. Pellets 
were washed twice by resuspending in 0.5 and 
1ml CSK buffer for 5 min, and centrifuged at 
1,600 g. Chromatin fractions were obtained 
by resuspending the pellet in 500 ul 2X Sample 
loading buffer (2X SLB: 100 mM Tris-HCl 
(pH 6.8), 4% SDS, 20% Glycerol, 0.2% Bromo- 
phenol blue, 10% {-Mercaptoethanol) and 
sonicated 15 x 30 sec ON/OFF cycles on a Bio- 
ruptor. 100 ul 6X SLB was added to the soluble 
fraction, and then equal volumes of soluble 
and chromatin fractions loaded for SDS-PAGE 
and immunoblotting as outlined above. 
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Flow cytometry 

Cells harvested for each time point from a well 
of a 6-well plate were pelleted at 1200 g, re- 
suspended in 75 ul PBS and fixed by adding 
1 ml 100% freezer-cold ethanol with gentle 
vortexing and stored at —20°C. Fixed cells 
were pelleted by centrifugation at 1300 g for 
5 min, washed in 1X PBS/0.1% Triton-100-X 
(PBS-T) and resuspended in 1 ml of 2 ug/ml 
DAPI/PBS-T and incubated for 1 to 16 hours at 
4°C. For DNA content determination, cells 
were pelleted at 1200 g and resuspended in 
350 ul PBS for analysis on a Cytoflex S analyzer 
(Beckman Coulter), Violet 405 nm laser, 450/ 
45 bandpass filters, 20,000 events in the single- 
cell population gate recorded. Data analysis 
performed using Flowjo v10.8.1 (Flowjo LLC, 
BD), with G1, S, G2/M fractions quantified 
using the Dean-Jett-Fox model with G1 and 
G2 peaks constrained based on histogram of 
asynchronous cells. To assess EdU incorpora- 
tion, cells were pulse labeled for 15 min with 
40 uM Ethynyl-2’-deoxyuridine (EdU; Sigma 
Cat. No. 900584-50mg) before harvesting and 
fixation as above. Fixed cells were washed in 
PBS-T and then resuspended in 200 tl Click re- 
action buffer (2 mM CuSO,, 50 mM L-ascorbic 
acid, 0.2 ul/ml Alexa Fluor® 488 azide (Invi- 
trogen Cat. No. A10266; 0.2 g/ul solution) in 
PBS-T modified from (77) for 30 min at RT, 
washed in PBS-T, stained with DAPI as above 
and then analyzed on a Cytoflex S analyzer 
with 450 nm laser and 525/50 filter for EdU 
detection. 


DNA combing 


Exponentially growing MEFs and mESCs were 
pulse labeled by addition of 25 uM CldU 
(Sigma, Cat. No. C6891) for 20 min, washed with 
pre-warmed PBS and then pulsed with 125 uM 
IdU (Sigma, Cat. No. 17125) for 20 min. After 
trypsinization, 6x10° cells were used to cast 3 
agarose (Biorad, Cat. No. 1613111) plugs per 
condition and processed for DNA combing 
according to a previously described protocol 
(72, 73), omitting SCE buffer plug digestion 
steps. IdU and CldU were detected using 
mouse anti-BrdU (BD, Cat. No. 347580; RRID: 
AB_400326) and rat anti-BrdU (Abcam, Cat. 
No. ab6326; RRID:AB_ 305426), respectively. 
DNA was detected using anti-ssDNA antibody 
(Millipore, MAB 3034; RRID:AB_94645). Images 
were acquired on a widefield microscope (Zeiss 
Axiophot) with a 63X or 40X lens. The 2.33 
Kbp/um elongation rate (um to Kbp conver- 
sion) was obtained from bacteriophage lambda 
DNA combing and measurement. Measure- 
ments and analysis were performed using 
ImageJ. DNA fork speed was obtained by 
dividing the length of the IdU tracks adjacent 
to CldU tracks (ongoing forks) by the IdU in- 
cubation time (20 min) and expressed in Kb/ 
min. Fork asymmetry is presented as left IdU 
versus right IdU ratios. Interorigin distances 
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(IODs) correspond to the space (in Kb) be- 
tween the center points of adjacent bi-direc- 
tional replication origins. 

Fork elongation experiment (fig. S16). Asyn- 
chronous DONSON-AID2 HCT116 cells were 
pulsed with 25 uM CldU for 20 min, washed 
with warm media, and incubated in new media 
containing 125 uM IdU with either vehicle or 
15 uM 5-Ph-IAA for 25 min. Cells were sub- 
sequently trypsinized and processed for DNA 
combing as described above. 


Growth curves 


MEFs (1.5 x 10° cells) were seeded on day 0 
into a T25 flask, split and counted every 3 
days, and 1.5 x 10° cells were reseeded into a 
new flask. mESCs (5 x 10° cells) were seeded 
on day 0 into a T25 flask and split, counted 
and reseeded every 48 hours at the same den- 
sity. MEFs and mESCs were grown in 3% O». | 
Counts were measured in duplicate using a 
Countess automated cell counter according to 
manufacturer’s instructions. Doubling times 
were calculated during log-phase growth (day 
3 to 36 for MEFs and day 2 to 16 for mESCs) 
using the formula: ¢/log2 (e/b) where t = time 
in hours, e = final population size and b = pop- 
ulation size at the start of log-phase growth. 


Generation of Donson™4407™40T knock-in mice 
using CRISPR 


Mouse studies were approved by the Univer- 
sity of Edinburgh animal welfare and ethical 
review board and conducted according to UK 
Home Office regulations under UK Home 
Office project licenses P2A477A62 and PP2060675. 
Fertilized eggs were injected with a CRISPR 
mix targeting mDonson containing 50 ng/ul 
Cas9 mRNA (Trilink, Cat. No. L-6125), 25 ng/ul 
gRNA (TACCTTAAGCATTTGCATTG) and 
150 ng/ul single-stranded oligodeoxynucleotide 
repair template (ssODN, IDT ultramer, 5’- 
AAATTGCTGGACTGTATGTAGATGAAGTAA- 
ACACTTACCTTAAGCATTTGCGTTGATGCAC- 
CTCGGAAGGCTATTGGGGATAAGAGAGT- - 
GGGTGGGAGACCTGCCTGT-3’) in nuclease- 
free water. Injected eggs cultured overnight 
up to the two-cell stage were transferred to the 
oviducts of pseudo-pregnant females. Although 
the repair template contained a silent PAM site 
mutation (c.1314C>A) besides the desired 
c.1319T>C substitution (p.M440T), a founder 
mouse without the PAM site mutation was 
selected to establish the mouse line which 
was maintained on a mixed CD1/BL6CBAF1 
background. WT and Donson™01/™!°T WEFs 
(Mus musculus, mixed background CD1/ 
BL6CBAF1) were derived from embryos and 
immortalized using CRISPR to delete p53. 


Embryo measurements 


Dissected embryos were washed in ice-cold 
PBS for 5 min and fixed with ice-cold 4% 
Paraformaldehyde/PBS overnight at 4°C, 
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washed several times in PBS, and imaged using 
a stereomicroscope. In lateral images, occipital- 
frontal distance was measured from nasion to 
occiput and embryo length was measured 
from bregma to the most caudal point of the 
embryo (excluding hindlimbs and tail); scale 
bars indicated in figures and legends. For cor- 
tical thickness analysis, embryos were cryo- 
preserved in 30% Sucrose/PBS, embedded in 
Tissue-Tek OCT (Sakura Cat. No. 4583) and 
12 uM coronal cryostat sections were obtained 
and stored at -70°C. After PBS washes, sec- 
tions containing brain tissue were stained for 
10 min in 0.1 ug/ml DAPI solution in PBS for 
5 min, washed in PBS and mounted using 
Mowiol 4-88 (Sigma Cat. No. 81381). Embryos/ 
samples allocated to groups on basis of geno- 
types. No statistical method was used to pre- 
determine sample size. Sample size was chosen 
based on standard practices in the field, no 
blinding of measurements were done, no data 
was excluded. 


Alcian-blue Alizarin-red stainings 


Embryos were fixed in 100% ethanol for 2 
days, placed in acetone for 3 days, and stained 
O/N at 37°C in staining solution containing 
1 ml 0.3% alcian blue (Sigma Ca. No. 05500) in 
70% ethanol, 1 ml 0.1% alizarin red (Sigma Ca. 
No. 5533) in 95% ethanol, 1 ml glacial acetic 
acid and 17 ml 70% ethanol. After 2 washes 
in ethanol, embryos were cleared in 1% KOH 
(w/v), 1% KOH/10% glycerol and finally stored 
in 10% glycerol/PBS until imaging using a ste- 
reoscope (Leica). 


Statistical methods used in mammalian 
cell experiments 


Statistical testing was performed using Graph- 
Pad Prism v.9.1.1. Two-sided parametric (¢-tests) 
or non-parametric Mann-Whitney U tests were 
performed for quantitative measurements as 
indicated in figure legends. Two-way ANOVA 
tests were performed for S-phase-entry assays; 
significance (P values) indicated on figures. 
Number of samples and/or experimental rep- 
licates indicated on figures or legends. No ex- 
clusion criteria were pre-established. No data/ 
sample points were omitted. 
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man DONSON) is essential for the initiatic_ eee 


MOLECULAR BIOLOGY 


DNSN-1 recruits GINS for CMG helicase assembly 
during DNA replication initiation in 


Caenorhabditis elegans 


Yisui Xiat, Remi Sonnevillet, Michael Jenkyn-Bedford+, Liqin Ji, Constance Alabert, Ye Hong*, 


Joseph T. P. Yeeles*, Karim P. M. Labib* 


INTRODUCTION: Eukaryotic chromosomes are 
duplicated by a molecular machine known as 
the replisome, the assembly of which is highly 
regulated to ensure that cells make a single 
copy of their genome during each cell cycle. In 
humans, defects in replisome assembly are 
often associated with early cancer development 
and can lead to a form of microcephalic pri- 
mordial dwarfism called Meier-Gorlin syndrome. 
Eukaryotic replisome assembly is initiated by 
the assembly and activation of the 11-subunit 
helicase called CMG (CDC-45-MCM-2-7-GINS), 
around which the replisome forms. First, two 
rings of the six adenosine triphosphatases 
(ATPases) known as MCM-2-7 (comprising the 
MCM-2 to MCM-’7 proteins) are assembled 
around double-stranded DNA to form double 
hexamers at replication origins at the end of 
mitosis. Second, CDC-45 and the four-protein 
GINS complex are recruited to MCM-2-7 double 
hexamers during S phase to form a pair of CMG 
helicases at the heart of two nascent replisomes, 
in a process that is controlled by several protein 
kinases and multiple assembly factors. Finally, 
the two helicases are activated in a poorly under- 


MCM-2-7 
CMG 


helicase 
assembly 


coc-45@ / 
CDC-45@ 


stood step, in which the MCM-2-7 rings are 
opened transiently to exclude one DNA strand. 


RATIONALE: The mechanism of CMG helicase 
assembly and activation has been studied most 
intensively using budding yeast, for which 
the entire cycle of DNA replication has been 
reconstituted with purified proteins. How- 
ever, recent evidence indicates considerable 
evolutionary diversification in the factors that 
mediate and control CMG assembly. For ex- 
ample, the Cdc7 kinase is essential for helicase 
assembly in budding yeast, yet CDC7 is dis- 
pensable in untransformed mouse and human 
cells. Furthermore, the yeast Sld2 protein is an 
essential helicase assembly factor with ho- 
mology to vertebrate RECQIA, but studies of 
Xenopus RECQLA indicate that it acts after 
CMG assembly. These findings suggest that 
animal cells contain additional helicase as- 
sembly factors that remain to be identified. 


RESULTS: Using the embryo of the nematode 
Caenorhabditis elegans as a model for study- 
ing metazoan replisome assembly, we show 


helicase 
~~ 


—_ 
CMG 
helicase 


DNSN-1 recruits GINS for CMG helicase assembly during DNA replication initiation in C. elegans. Upon 
entry into S phase, CDC-45 and GINS are recruited by multiple assembly factors to double hexamers of the MCM-2-7 
ATPases to form pairs of CMG helicases at bidirectional replication forks. DNSN-1 is a dimeric protein that is required 
for recruitment of GINS and functions in a complex with the BRCT-domain protein MUS-101/TOPBP1. 
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DNA replication but is dispensable for the 
subsequent progression of replication forks. 
Our data indicate that the BRCT-repeat pro- 
tein MUS-101/TOPBP1 recruits DNSN-1 to the 
preinitiation complexes that form on MCM-2- 
7 double hexamers during S phase, thereby 
triggering assembly of the CMG helicase. 

We show that DNSN-1 is dispensable for 
recruitment of CDC-45 to chromatin but is es- 
sential for the loading of GINS. Cryo-electron 
microscopy demonstrates that a dimer of 
DNSN-1 binds simultaneously to multiple sites 
on GINS and the MCM-3 helicase subunit, 
indicating that DNSN-1 positions GINS to pro- 
mote CMG assembly during DNA replication 
initiation. Consistent with this view, we show 
that deletion of one of the binding sites on 
DNSN-1 for GINS leads to initiation defects, 
whereas point mutations in DNSN-1 that dis- 
rupt one of the interfaces with MCM-3 are 
lethal and block recruitment of DNSN-1 to 
preinitiation complexes during S phase. 

In budding yeast, the Mcm10 protein is re- 
quired for activation of the nascent CMG heli- 
case complexes. We show that near-complete ‘ 
deletion of the mcm-10 coding sequence by 
CRISPR-Cas9 is viable in C. elegans, indicating 
that other factors can also contribute to heli- 
case activation in metazoa. Our data suggest ‘ 
that DNSN-1 might also play a role during this 
step, because RNAi depletion of DNSN-1 is syn- 
thetic lethal with mcm-10A. 

Orthologs of DNSN-1 are found in animals 
and plants but are absent from budding yeast 
and many fungal species. This suggests that 
the role of DNSN-1/DONSON during DNA re- 
plication initiation emerged at an early step 
of eukaryotic evolution but was subsequently 
lost during fungal evolution. ‘ 


CONCLUSION: Our findings identify DNSN-1/ 
DONSON as a missing link in our understand- 
ing of DNA replication initiation in animal - 
cells, with an essential role in CMG helicase 
assembly. The requirement for DNSN-1 during 
DNA replication initiation indicates substan- 
tial differences in the mechanism of replisome 
assembly between metazoa and budding yeast. 
Consistent with the essential role of DNSN-1 
during CMG helicase assembly in C. elegans, 
mutations in human DONSON lead to Meier- 
Gorlin syndrome. 
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DNSN-1 recruits GINS for CMG helicase assembly 
during DNA replication initiation in 


Caenorhabditis elegans 


Yisui Xia't+, Remi Sonneville';, Michael Jenkyn-Bedford*{§, Liqin Ji*, Constance Alabert*, 
Ye Hong'**, Joseph T. P. Yeeles”*, Karim P. M. Labib?* 


Assembly of the CMG (CDC-45-MCM-2-7-GINS) helicase is the key regulated step during eukaryotic DNA 
replication initiation. Until now, it was unclear whether metazoa require additional factors that are not 
present in yeast. In this work, we show that Caenorhabditis elegans DNSN-1, the ortholog of human DONSON, 
functions during helicase assembly in a complex with MUS-101/TOPBP1. DNSN-1 is required to recruit the 
GINS complex to chromatin, and a cryo-electron microscopy structure indicates that DNSN-1 positions GINS on 
the MCM-2-7 helicase motor (comprising the six MCM-2 to MCM-7 proteins), by direct binding of DNSN-1 to 
GINS and MCM-3, using interfaces that we show are important for initiation and essential for viability. These 
findings identify DNSN-1 as a missing link in our understanding of DNA replication initiation, suggesting 
that initiation defects underlie the human disease syndrome that results from DONSON mutations. 


he initiation of chromosome replication 

is highly regulated in eukaryotic cells to 
ensure that a single copy of each chro- 
mosome is produced in every cell cycle 

(, 2). Initiation is frequently misregulated 

in human cancer and represents an interest- 
ing target for new anticancer therapeutics 
(3, #). The key regulated step during initia- 
tion is the assembly at multiple replication 
origins of the DNA helicase known as CMG 
(CDC-45-MCM-2-7-GINS), around which the 
replisome assembles (5-7). CMG unwinds the 
parental DNA duplex at a replication fork and 
associates stably with the DNA template until 
neighboring forks converge during DNA rep- 
lication termination. CMG is then ubiquity- 
lated and disassembled by the CDC-48/p97 
adenosine triphosphatase (ATPase) with its 
ubiquitin receptors UFD-1 and NPL-4 (J, 8, 9). 
The mechanism by which eukaryotic DNA 
replication initiates is best characterized in 
the budding yeast Saccharomyces cerevisiae 
(fig. S1), and was initially analyzed by cellular 
studies and more recently studied through re- 
constituted CMG assembly reactions with puri- 
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fied budding yeast proteins (7, 10-12). These 
studies showed that the motor of the CMG 
helicase comprises a heterohexameric ring of 
the Mcm2-7 ATPases (comprising the six Mcm2 
to Mcm7 proteins). Two such rings are loaded 
in a concerted fashion around double-stranded 
DNA (dsDNA) at replication origins during G, 
phase (step 1 in fig. S1), thereby forming a 
head-to-head double hexamer of Mcm2-7 that 
lacks helicase activity (13, 14). Upon entry into 
S phase, Cdc7 kinase (also known as Dbf4- 
dependent kinase or DDK) phosphorylates 
Mcm2-7 double hexamers (step 2 in fig. S1) to 
create binding sites for Sld3 that recruits Cdc45 
(7, 12, 15-17). Meanwhile, CDK phosphorylates 
both Sld3 and Sld2 (step 3 in fig. S1) at sites 
that are recognized by pairs of BRCT domains 
in the amino terminus (binds phospho-Sld3) 
and carboxyl terminus (binds phospho-Sld2) 
of Dpb11 (78, 19). In this way, Dpb11 bridges 
phospho-Sld3 and Cdc45 on Mcm2-7 double 
hexamers to phospho-Sld2 that associates with 
GINS and DNA polymerase e (20, 27), leading 
to the formation of a preinitiation complex 
(step 4 in fig. S1). Recruitment of Cdc45 and 
GINS triggers the assembly of two CMG heli- 
case complexes (step 5 in fig. $1), with their 
Mcm2-7 rings still encircling dsDNA (22-24). 
Subsequently, the Mcm10 protein is required 
for a poorly understood step in which the 
Mcm2-7 ring is opened transiently to exclude 
one DNA strand (22, 23), leading to full activ- 
ation of the helicase (step 6 in fig. S1). 

Until now, the molecular details of CMG as- 
sembly and activation had not been elucidated, 
or the reactions reconstituted with purified 
proteins, for species other than budding yeast. 
A functional human replisome can be assem- 
bled in vitro if the initiation step is bypassed 
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by mixing purified human CMG with repli- 
cation fork DNA and purified orthologs of 
yeast replisome components (25). However, 
there is evidence for diversification and addi- 
tional complexity among metazoan initiation 
factors when compared with the predictions 
of the budding yeast model. CDC7 kinase was 
found to be dispensable in untransformed 
mammalian cells because of redundancy with 
CDK1-Cyclin B (26). Moreover, the RECQL4 
helicase has an amino-terminal region with 
homology to yeast Sld2 and is important for 
replication in Drosophila (27), Xenopus laevis 
egg extracts (28), and chicken DT40 cells (29), 
yet studies of Xenopus RECQL4 indicate a 
role after CMG assembly rather than before 
(28, 30). Finally, the metazoan ortholog of yeast 
Dpbil, known variously as TOPBP1 or MUS- 
101, contains more BRCT domains than Dpb11 


(31), and a study of Xenopus TOPBP1 indicated | 


that BRCT4 to BRCTS are dispensable for rep- 
lication initiation (32), despite containing the 
presumed binding site for RECQL4. By con- 
trast, BRCT3 of Xenopus TOPBP1 was found 
to be essential for replication initiation in ad- 
dition to BRCT1 and BRCT2, which bind to 
CDK-phosphorylated TRESLIN, the ortholog 
of yeast Sld3 (32). However, a partner for 
BRCT3 of TOPBP1/MUS-101 was not identified 
in previous studies. These findings suggested 
that additional initiation factors might remain 
to be identified in metazoan cells and, po- 
tentially, also in other eukaryotes. In this work, 
we show that the Caenorhabditis elegans pro- 
tein DNSN-1 is essential for replication initia- 
tion and acts together with MUS-101 during 
CMG helicase assembly. These findings pro- 
vide a model for understanding the human 
ortholog of DNSN-1, known as DONSON (down- 
stream neighbor of SON), which was previ- 
ously identified as a genome stability factor 
that is mutated in microcephalic primordial 
dwarfism (33, 34). 


Results 
DNSN-1 binds MUS-101 in C. elegans early 
embryos and copurifies with the CMG helicase 


To screen for regulators of the CMG helicase in 


the C. elegans embryo, which is arich source of — 


replisomes, we used worms in which the PSF-1 
subunit of GINS was tagged with green fluo- 
rescent protein (GFP) (35) to isolate GINS and 
substoichiometric partners on anti-GFP beads 
(fig. S2, A and B), including the CMG helicase 
and associated factors (Fig. 1A). Extracts of 
control embryos were compared with extracts 
of GFP—psf-1 embryos from worms that were 
either untreated or exposed to cdc-45 RNA in- 
terference (RNAi) to block replisome assembly 
or treated with npl-4 RNAi to block CMG dis- 
assembly and cause accumulation of postter- 
mination CMG with polyubiquitylated MCM-7 
subunit (lane 8 in Fig. 1A), without affecting 
the level of GINS (35). 
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Fig. 1. DNSN-1 associates with the C. elegans replisome and with MUS-101. 


(A) Control (N2) and GFP-psf-1 (KAL1) worms were fed on bacteria expressing the 
indicated RNAi treatments before preparation of embryonic cell extracts and 
immunoprecipitation (IP) on beads coated with anti-GFP antibodies. The indicated 
factors were detected by immunoblotting. (B) Embryo extracts from the indicated 


(C) Worms expressing GFP—DNSN-1 (KAL213) were grown on bacteria expressing the 
indicated RNAi and then processed as described above in (A). (D) The interaction 
between C. elegans DNSN-1 and the indicated fragments of MUS-101 was monitored by 
the yeast two-hybrid assay. Yeast cells expressing fusions to the Gal4 activation 
domain (AD) or DNA binding domain (DBD) were grown on nonselective medium or 


strains [N2, KAL1, and GFP—dnsn-1 (KAL213)] were processed as above in (A). 


Mass spectrometry analysis of anti-GFP pull- 
downs (table SIA and data S1), validated sub- 
sequently by immunoblotting (Fig. 1, A and B), 
confirmed the presence of all 11 CMG subunits 
in the GFP-PSF-1 samples, together with a set 
of replisome components that were previously 
shown to associate with the CMG helicase in 
C. elegans embryo extracts (35). DNSN-1 also 
copurified with GFP-PSF-1 (lane 6 in Fig. 1A) 
in a manner that was enhanced when npl-4 
RNAi was used (lane 8 in Fig. 1A). This sug- 
gests that DNSN-1 might interact not only with 
GINS but also with other components or part- 
ners of the CMG helicase. The specificity of these 
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selective medium, as descri 


interactions was confirmed by analogous pull- 
downs of GFP-tagged DNSN-1 from embryo 
extracts that were analyzed by immunoblot- 
ting (Fig. 1B and fig. S2C) and mass spectrom- 
etry (fig. S2D, table S2, and data $2). 

The initiation factor MUS-101 was also en- 
riched in pull-downs of GFP-PSF-1 from em- 
bryos treated with npl-4 RNAi (Fig. 1A and 
table S1), indicating that posttermination CMG 
complexes are a useful tool for identifying 
additional replisome-binding proteins. More- 
over, MUS-101 was enriched in pull-downs of 
GFP-tagged DNSN-1 (Fig. 1B and table S2), 


bed in the Materials and methods. wt, wild type. 


the interaction of DNSN-1 with CMG compo- 
nents (Fig. 1C). These data suggest that DNSN-1 
is a partner of MUS-101. A yeast two-hybrid 
assay indicated a direct interaction that was 
dependent upon BRCT3 of MUS-101 (Fig. 1D), 
consistent with predictions that we made using 
AlphaFold-Multimer (36), as described below. 


DNSN-1 is essential for the initiation of 
DNA replication 


C. elegans DNSN-1 had not previously been 
linked to chromosome replication, and RNAi 
depletion was reported to cause behavioral 


even after cdc-45 RNAi treatment that impaired 
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phenotypes linked to neuromuscular defects 
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(37). However, mutation of the Drosophila 
ortholog, known as humpty dumpty, pro- 
duced DNA synthesis defects (38). Moreover, 
mutations in human DONSON cause micro- 
cephalic dwarfism and associated DNA repli- 
cation defects and DNA damage (33, 34). In 
addition, human DONSON copurified with 
several replisome factors, including subunits 
of the CMG helicase (34, 39). 

We deleted 87% of the dnsn-I coding region 
by CRISPR-Cas9 (fig. S3A) and found that 
homozygous dnsn-JA was lethal during larval 
development (fig. S3A). Similarly, we deleted 
almost the entire coding region of tres-1 (fig. 
S3B; worm TRESLIN), mus-101 (fig. S3C), and 
sld-2 (fig. S3D) and found that worms homo- 
zygous for the deleted alleles were sterile (fig. 
83), indicating a cell proliferation defect that 
was consistent with previous data showing 
that injection of RNAi specific to sid-2 and tres- 
1/sld-3 caused embryonic lethality (40, 417). 

Worms fed on bacteria that expressed long 
double-stranded RNA (dsRNA) corresponding 
to dnsn-l, tres-1, mus-101, or sld-2 remained 
viable [see below and (40-42)], consistent with 
previous data from large-scale RNAi feeding 
screens in C. elegans (37, 43). This likely reflects 
residual protein that can still provide some 
level of function, despite efficient depletion, 
and we found that a small amount of DNSN-1 
protein remained in the nucleus (Fig. 2A) when 
GFP—dnsn-1 worms were fed on bacteria ex- 
pressing dnsn-1 RNAi. By contrast, RNAi spe- 
cific to the GFP tag caused further depletion 
of nuclear GFP-DNSN-1 (Fig. 2A) and led toa 
total loss of viability in GFP—dnsn-1 worms, 
without affecting control animals (Fig. 2B). 
Together with the lethal phenotype of dnsn- 
JA, these findings demonstrate that DNSN-1 is 
essential for viability in C. elegans and indicate 
that dnsn-1 RNAi and GFP RNAi can be used 
to achieve milder and stronger depletion, re- 
spectively, of GFP-tagged DNSN-1 for pheno- 
typic analysis. 

To examine the role of DNSN-1 during DNA 
replication in the C. elegans embryo, we ex- 
posed embryonic cells from GFP—dnsn-1 em- 
bryos to a pulse of the nucleoside precursor 
5-ethynyl-2’-deoxyuridine (EdU). Incorpora- 
tion of EdU into genomic DNA was impaired 
by dnasn-I RNAi and strongly inhibited by GFP 
RNAi (Fig. 2, C and D), indicating that DNSN-1 
is essential for chromosome replication. To 
determine at which stage DNSN-1 acts, we ex- 
posed embryonic cells to a pulse of EdU as 
above after partial depletion of DNSN-1, TRES-1, 
or MUS-101 by RNAi and then used DNA 
combing (44) to monitor origin firing and the 
progression of DNA replication forks (Fig. 2E). 
Whereas fork progression was not signifi- 
cantly affected in worms treated with RNAi 
to dnsn-1, tres-1, or mus-101, the spacing be- 
tween replication forks increased in all cases 
(Fig. 2F), corresponding to a defect in origin 
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firing. These findings indicate that DNSN-1 
is dispensable for fork progression but is re- 
quired for the initiation of DNA replication 
in C. elegans, together with TRES-1, MUS-101 
(Fig. 2F), and SLD-2 (40). 


MUS-101-dependent association of DNSN-1 with 
preinitiation complexes 


The C. elegans early embryo divides rapidly, 
and DNA replication initiates on condensed 
chromatin in the first cell cycle. CDC-45 is de- 
tected on the six condensed chromosomes at 
the end of the second meiotic division, and its 
chromatin association is dependent upon MCM- 
2-7 proteins (45). Previous work suggests that 
the CMG helicase is required for rapid chro- 
mosome decondensation in the C. elegans 
early embryo (45). Although chromosome de- 
condensation proceeds normally when DNA 
synthesis is inhibited downstream of CMG as- 
sembly by RNAi depletion of the single-stranded 
DNA (ssDNA) binding protein RPA, the ribo- 
nucleotide reductase RNR-1, or the polymerase 
accessory factor PCN-1 (PCNA), RNAi depletion 
of cdc-45 causes a profound delay in chromo- 
some decondensation upon entry into S phase 
of the first embryonic cell cycle (45). These 
findings suggest that the DNA unwinding ac- 
tivity of the CMG helicase at replication forks 
drives rapid chromosome decondensation in 
the early embryonic cell cycles (Fig. 3A). 

When cells entered S phase in the absence 
of CDC-45, both DNSN-1 (cdc-45 RNAi in Fig. 
3, A to C) and MUS-101 (cdc-45 RNAi in Fig. 3, 
D and E) were detected on the chromosomes 
that remained condensed during early S phase. 
By contrast, depletion of MCM-2-7 (mcm-7 RNAi 
in Fig. 3, B and D), or codepletion of CDC-45 
and MCM-2-7 (cdc-45 + mcem-7 RNAi in Fig. 3, 
C and E), delayed chromosome decondensa- 
tion without detectable chromatin association 
of DNSN-1 or MUS-101. These data indicate 
that DNSN-1 and MUS-101 associate with pre- 
initiation complexes that contain the loaded 
MCM-2-7 ATPases, which persist on condensed 
chromosomes when CMG assembly is blocked. 
Consistent with this view, DNSN-1 was also 
observed on condensed chromosomes during 
early S phase after depletion of PSF-1 (psf-1 
RNAi in fig. S4). However, DNSN-1 was not 
detected on condensed chromosomes after 
depletion of MUS-101 (mus-101 RNAi in fig. 
S4). Because MUS-101 is a partner of DNSN-1 
(Fig. 1), this indicates that MUS-101 helps to 
recruit DNSN-1 to the preinitiation complexes 
that assemble on MCM-2-7 double hexamers 
before CMG helicase assembly. 


DNSN-1 is required for chromatin recruitment of 
GINS but not CDC-45 during CMG assembly 


Upon entry into S phase of the first embryonic 
cell cycle, GFP-PSF-1 was observed on con- 
densed chromosomes (control in Fig. 4A), but 
this was lost upon depletion of CDC-45 (cdc-45 


22 September 2023 


RNAi in Fig. 4A), indicating that chromatin- 
bound PSF-1 corresponds to assembled CMG 
helicase complexes. Depletion of GFP-PSF-1 
delayed chromosome decondensation upon 
entry into the first embryonic S phase as ex- 
pected (GFP RNAi in Fig. 4B). However, CDC-45 
was still detected on condensed chromosomes 
in the absence of PSF-1 under such conditions 
(GFP RNAi in Fig. 4C) and this was dependent 
upon the MCM-2-7 complex (GFP + mcm-7 
RNAi in Fig. 4C). This indicated that the initial 
recruitment of CDC-45 to chromatin-loaded 
MCM-2-7 in early S phase does not require 
GINS and can be monitored independently. 
Therefore, the C. elegans early embryo pro- 
vides a useful experimental system to distin- 
guish which metazoan initiation factors are 
required for recruitment of CDC-45 or GINS to 
MCM-2-7 double hexamers during DNA repli- 
cation initiation. 

Consistent with its role during initiation 
(Fig. 2), and its presence in preinitiation com- 
plexes with MUS-101 (Fig. 3 and fig. S4), 
depletion of DNSN-1 delayed chromosome 
decondensation during early S phase (fig. S5), 
suggesting that DNSN-1 is required for some 
aspect of CMG helicase assembly. However, 
mCherry-CDC-45 was still recruited to con- 
densed chromosomes during early S phase 
in embryos lacking GFP-DNSN-1 (Fig. 4D; 
100% embryos, n = 5), analogous to the effect 
of depleting GFP-PSF-1. The same was true 
in GFP-sld-2 embryos treated with GFP RNAi, 
whereas depletion of GFP-TRES-1 or GFP- 
MUS-101 impaired the recruitment of mCherry- 
CDC-45 to chromatin (Fig. 4D). Similarly, 
RNAi depletion of DNSN-1, SLD-2, TRES-1, or 
MUS-101 delayed chromosome decondensa- 
tion upon entry into S phase of the second cell 
cycle in worms expressing GFP-histone H2B 
and mCherry-CDC-45, but CDC-45 persisted 
on condensed chromatin in response to dnsn-1 
or sld-2 RNAi, whereas depletion of TRES-1 or 
MUS-101 impaired chromatin association of 
CDC-45 (fig. S6). Overall, these findings indi- 
cate that DNSN-1 and SLD-2 are dispensable 
for CDC-45 chromatin recruitment in early S 
phase, in contrast to both TRES-1 and MUS-101. 

mCherry-PSF-1 colocalized with GFP-DNSN-1 
on condensed chromatin during early S phase 
(Fig. 4E and fig. S7; control, 80% embryos, 
n = 10). By contrast, chromatin association of 
mCherry-PSF-1 was lost upon depletion of 
DNSN-1 (GFP RNAi in Fig. 4E and fig. $7; 
100% embryos, n = 14), indicating that DNSN-1 
is required for stable incorporation of GINS 
into CMG helicase complexes on chromatin 
during S phase. To ensure that the failure to 
detect GINS on chromatin in the absence of 
DNSN-1 was not due to premature ubiquity- 
lation and disassembly of the CMG helicase, 
we depleted DNSN-1 in combination with NPL-4 
(fig. S8). Consistent with previous observa- 
tions (35), NPL-4 depletion led to persistence 
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Fig. 2. DNSN-1 is important for the initiation of DNA replication. (A) Worms correspond to the mean values and standard deviations from three biological 
expressing GFP—DNSN-1 and mCherry—histone H2B (KAL267) were fed bacteria replicates. The samples were compared by using a Kruskal-Wallis test followed by 
expressing the indicated RNAi, and embryos were then examined by spinning- Dunn's test, yielding the indicated p values. (E) Control worms (N2) were treated 
disk confocal microscopy. The images show late S phase of the first embryonic with a pulse of EdU and processed for molecular combing of DNA fibers, as 
cell cycle. In the bottom panels, contrast was adjusted to monitor residual GFP— described in the Materials and methods. DNA fibers were stained with YOYO-1, and 
DNSN-1 (arrows). The scale bar corresponds to 10 um. (B) Control (N2) and GFP- _—_EdU labeling was detected as above. Fork progression is defined as the length of 
dnsn-1 (KAL213) worms were fed on bacteria expressing the indicated RNAi EdU-labeled tracks, whereas interfork distance corresponds to the distance between 
before measurement of embryonic viability (see Materials and methods). The data two EdU-labeled tracks on the same fiber. The scale bar corresponds to 20 um. 
represent the means and standard deviations from three biological replicates. (F) Control worms (N2) were treated with the indicated RNAi and then processed 
(C) GFP-dnsn-1 (KAL213) worms were fed on bacteria expressing the indicated RNAi __as in (E). The data points from three independent experiments are depicted in a 
before a population of single cells was isolated from embryos as described in the violin plot, with median values shown as red circles. The average of the median 
Materials and methods. The cells were incubated with EdU for 20 min at room values is shown as a red bar with the associated standard deviation. The median 
temperature and then fixed before EdU detection and DNA staining with Hoechst values for each experiment were then compared by using a Kruskal-Wallis test 


33342. Scale bars correspond to 10 um. (D) Quantification of the data in (C), which followed by Dunn’s test, yielding the indicated p values. 
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Fig. 3. DNSN-1 and MUS-101 

are recruited to preinitiation 
complexes on chromatin 

during early S phase. (A) Illustra- 
tion of how DNSN-1 and MUS- 

101 associate with condensed 
chromatin during early S phase 

of the first embryonic cell 

cycle (left) and remain on 
chromatin when decondensation 

is delayed in worms lacking 
CDC-45 (right). (B) Embryos 
expressing GFP-DNSN-1 and 
mCherry—histone H2B (KAL267) 
were treated with the indicated 
RNAi before analysis by time-lapse 
video microscopy. The images 
show the progression of the female 
pronucleus through early S phase 
of the first embryonic cell cycle. 
White arrows indicate DNSN-1 on 
persistently condensed chromo- 
somes in the absence of CDC-45; 
ed arrows indicate the loss of 
DNSN-1 from condensed 

chromatin in the absence of MCM-7. 
(C) Similar experiment to that in 
(B) comparing embryos treated 
with cdc-45 RNAi or cdc-45 + mem-7 
double RNAi. White arrows indicate 
DNSN-1 on condensed chromo- 
somes; red arrows indicate the 

oss of DNSN-1 from condensed 
chromosomes without MCM-7. 

(D and E) Equivalent experiments 
o those in (B) and (C), respectively, 
but with embryos expressing 
GFP-MUS-101 and mCherry-histone 
H2B (KAL276). All scale bars 
correspond to 5 um. The female 
pronucleus is located at a variable 
depth within the embryo, which 
can lead to differences in 
brightness between images. P.b, 
polar body. 
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of mCherry-PSF-1 on chromatin during mito- 
sis (npl-4 RNAi in fig. S8, A and B), reflecting a 
failure of CMG disassembly during DNA rep- 
lication termination. However, chromatin 
association of mCherry-PSF-1 was reduced 
around fivefold when GFP-DNSN-1 was de- 
pleted in addition to NPL-4 (npl-4 + GFP RNAi 
in fig. S8, A to C), similar to codepletion of NPL: 
and TRES-1 (npl-4 + tres-1 RNAi in fig. S8, A 
to C), NPL-4 and MUS-101 (npl-4 + mus-101 
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RNAi in fig. $8, D to F), or NPL-4 and SLD-2 
(npl-4 + GFP RNAi in fig. $8, D to F). By con- 
trast, chromatin-bound GFP-PSF-1 was not re- 
duced by codepletion of RPA-1 and NPL-4: (46), 
which blocked DNA replication after CMG as- 
sembly. These data indicate that DNSN-1 is re- 
quired to assemble the CMG helicase, together 
with MUS-101, TRES-1, and SLD-2. Together 
with the data for recruitment of PSF-1 and CDC- 
45 during early S phase (Fig. 4), these findings 
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cdc-45 RNAi blocks initiation: 
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also indicate that DNSN-1 is specifically required 
for chromatin recruitment of GINS but is dis- 
pensable for chromatin recruitment of CDC-45. 


Depletion of DNSN-1 or MUS-101 is synthetic 
lethal with mcm-10A or cdc-7A 

As noted above, budding yeast Mcm10 is es- 
sential for activation of the newly assembled 
CMG helicase (22, 47, 48). However, deleting 
95% of the coding sequence of C. elegans 
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Fig. 4. DNSN-1 and 
SLD-2 are required 
for chromatin 
recruitment of GINS 
but not CDC-45 
during early 

S phase. (A) Embryos 
expressing GFP- 
PSF-1 and mCherry- 
histone H2B (KAL3) 
were grown with or 
without cdc-45 RNAi 
and then analyzed 

by time-lapse video 
microscopy. White 
arrows indicate GINS 
on condensed chro- 
mosomes in early 

S phase (63% 
embryos, n = 16); 

ed arrows indicate 
he loss of GINS from 
condensed chromo- 
somes in the absence 
of CDC-45 (100% 
embryos, n = 12). 
(B) Similar experiment 
to that in (A) with or 
without GFP RNAi. 
White arrows indicate 
the persistence of 
condensed chromo- 
somes in the absence 
of GFP-PSF-1. 

(C) Embryos 
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expressing GFP- 2 
PSF-1 and mCherry- 8 
CDC-45 (KAL266) 

were exposed to the 
indicated RNAi treat- 

ment and analyzed 

as in (A). White arrows x 
indicate CDC-45 on ira 
chromatin in the tt 
absence of PSF-1; red es 


arrows indicate the 
loss of CDC-45 from 
chromatin in the 
combined absence 
of PSF-1 and MCM-7. 
(D) The indicated 
strains (KAL268, 
KAL271, KAL274, and 
KAL277) were grown 
as 


GFP + mcm-7 RNAi 


D 


mcm-10 (fig. S9, A, D, and E) was not lethal 
(Fig. 5B), indicating that other initiation factors 
must support CMG activation in the absence of 
MCM-10. To explore how other initiation fac- 
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shown in the presence or absence of GFP RNAi before analysis as in (A). 

White arrows indicate the persistence of CDC-45 on chromatin in the absence of 
SN-1 or SLD-2 compared with that in control embryos; red arrows indicate the 
loss of CDC-45 from chromatin in the absence of TRES-1 or MUS-101. (E) Embryos 
expressing GFP—DNSN-1 and mCherry—PSF-1 (KAL269) were analyzed as in (A). 
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tors might compensate for the loss of MCM-10, 
we tested the effects of depleting such factors 
by RNAi in mcm-10A worms, taking advan- 
tage of the fact that RNAi depletion of DNSN-1, 
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White arrows indicate colocalization on chromatin of mCherry-PSF-1 and GFP- 
DNSN-1 during early S phase in control embryos; red arrows indicate the loss of 


nin the absence of DNSN-1. All scale bars correspond 
cleus is located at a variable depth within the embryo, 


which can lead to differences in brightness between images. 


SLD-2, or TRES-1 is incomplete, as discussed 
above, and so is viable as a single treatment 
(Fig. 5A). Depletion of DNSN-1, MUS-101, or 
SLD-2 led to a loss of viability in combination 
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Fig. 5. Depletion of DNSN-1 or A 
other initiation factors causes 
synthetic lethality in embryos that 
lack MCM-10 or CDC-7. (A) The 
embryonic viability of control 

worms (N2) was analyzed after 

they fed on bacteria expressing 

the indicated RNAi or empty 

vector as control. The data 
represent the mean values and 
standard deviations from 

three biological replicates. 

(B) Similar analysis to that 

in (A) of mem-10A worms (KAL259). 
(C) Equivalent experiment to 

that in (A) with cdc-7A 

worms (KAL255). 
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with mcm-10A, whereas tres-1 RNAi only 
caused reduced viability in the absence of 
MCM-10 (Fig. 5, A and B). 

Previous work showed that worms that 
lack 68% of the coding sequence of cdc-7 are 
viable (49), and we found that the same is true 
for worms in which 97% of the coding se- 
quence was removed by CRISPR-Cas9 (Fig. 5C 
and fig. S9, B and D). These findings demon- 
strate that the CDC-7 kinase is not essential for 
DNA replication initiation in C. elegans. How- 
ever, RNAi depletion of DNSN-1 or MUS-101 
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was lethal in cdc-7A worms (Fig. 5C), similar to 
the effects of depleting DNSN-1 or MUS-101 in 
mcm-10A. RNAi depletion of TRES-1 was also 
lethal in cdc-7A worms, whereas depletion of 
SLD-2 reduced viability (Fig. 5C). In addition, 
cdc-7A mcem-10A worms were unable to produce 
viable offspring (n = 7 double-mutant worms). 
These data are consistent with a role for 
MCM-10 and CDC-7 during the initiation of 
DNA replication in C. elegans but also indicate 
some level of redundancy with other factors. 


Though DNSN-1, TRES-1, MUS-101, and SLD-2 
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are all important for CMG assembly (Fig. 4 
and fig. S8), the synthetic lethality data for 
mcm-10A suggest that DNSN-1, SLD-2, and 
MUS-101 might additionally contribute to the 
helicase activation step. These findings further 
indicate that the mechanism and regulation 
of DNA replication initiation are more com- 
plex in C. elegans than would have been pre- 
dicted by the model derived from studies in 
budding yeast. 


Dimeric DNSN-1 binds the GINS and MCM-3 
components of the CMG helicase 


As a first step toward exploring whether DNSN-1 
associates directly with components of the CMG 
helicase, we tested whether copurification of 
DNSN-1 with GFP-PSF-1 from embryo ex- 
tracts required the core replisome compo- 
nents CTF-4, TIM-1, and CLSP-1 (CLASPIN). 
RNAi depletion of these factors did not af- | 
fect the copurification of DNSN-1 with PSF-1 
(Fig. 6A, table S3, and data S3), suggesting 
that DNSN-1 either binds directly to CMG 
components or to other CMG partners such 
as DNA polymerase e. 

To assay for direct binding of DNSN-1 to 
CMG, we purified recombinant forms of the 
C. elegans CMG complex and DNSN-1 from 
budding yeast cells and bacteria, respectively. 
The migration of DNSN-1 in a size-exclusion 
column indicated self-association (fig. S10A), 
consistent with DNSN-1 homodimerization 
predicted by AlphaFold-Multimer (fig. S10B). 
Subsequently, the proteins were analyzed by 
glycerol gradient ultracentrifugation, both in- 
dividually and after mixing. This showed that 
DNSN-1 formed a stable complex with the CMG 
helicase (Fig. 6, B and C), under conditions where 
association of DNSN-1 with purified GINS could 
not be detected (Fig. 6D). Although DNSN-1 can 
with GINS in embryo extracts in the absence 
of CMG (cdc-45 RNAi in Fig. 1A), these data 
indicate that DNSN-1 binds more tightly to 
CMG through additional interaction(s) with 
MCM-2-7 or CDC-45. 5 

The association of DNSN-1 with the CMG 
helicase was then analyzed by cryo-electron 
microscopy (cryo-EM) in the presence of rep- 
lication fork DNA as well as a complex of — 
TIM-1 and TIPN-1 that was included to stab- 
ilize association of CMG with DNA (50) (Fig. 
7, figs. S11 to S14, and table S4). The resulting 
structure, which was determined using cryo-EM 
density maps with average nominal resolutions 
of 2.6 to 3.8 A [Fourier shell correlation (FSC) = 
0.143 criterion], confirmed the existence of a 
DNSN-1 homodimer (Fig. 7B) that associates 
with the GINS and MCM-3 components of the 
helicase. Specifically, the folded domain of one 
protomer of the DNSN-1 dimer forms a large 
interface on the amino-terminal (forward-facing) 
side of the MCM-2-7 motor of the CMG heli- 
case. In this way, extensive interactions with 
the PSF-2, PSF-3, and SLD-5 subunits of GINS 
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Fig. 6. DNSN-1 binds directly A 
to the CMG helicase. (A) ct-4RNAi ~ 7 TFT 
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serve to position the DNSN-1 dimer on CMG, 
together with further interactions with the 
MCM-3 amino-terminal domain [(i) in Fig. 7C]. 

In addition to the folded domains of the 
DNSN-1 dimer, cryo-EM density correspond- 
ing to an a helix is observed at two other po- 
sitions within the complex. First, one o helix is 
bound to a distant region of GINS beside the 
amino terminus of SLD-5, where it contacts 
SLD-5 and PSF-2. Aided by an AlphaFold- 
Multimer prediction, the amino terminus of 
DNSN-1 (residues 4 to 19) can be unambigu- 
ously modeled into this density [(ii) in Fig. 7C 
and fig. S13, A to D]. The second a helix in- 
teracts with the AAA+ motor domain of MCM-3 
and features clear density attributable to a 
tryptophan residue [(iii) in Fig. 7C and fig. S13E]. 
With the aid of AlphaFold-Multimer, this density 
was determined to correspond to DNSN-1 resi- 
dues 417 to 435 (including Trp*”’) and belongs 
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to the same DNSN-1 protomer for which the 
folded domain interacts with GINS and the 
MCM-3 amino-terminal domain, as described 
above [(iii) in Fig. 7C and fig. S13, E to H]. The 
interaction of DNSN-1 with the MCM-3 AAA+ 
domain and the MCM-3 amino-terminal do- 
main, in addition to multiple GINS subunits, 
likely explains the greater affinity of DNSN-1 
for CMG compared with isolated GINS. 

To explore the physiological importance of 
DNSN-1’s association with GINS and MCM-3, 
we generated worms with structure-guided 
mutations in DNSN-1 that were predicted 
either to interfere with binding to GINS or to 
impair association with MCM-3. First, we gen- 
erated an allele that lacks amino acids 5 to 19 
of DNSN-1 (dnsn-IAN; fig. S9, C to E), which 
are located within the amino-terminal helix of 
DNSN-1 that binds near the amino terminus 
of SLD-5 [(ii) in Fig. 7C] and is important for 
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stable association of the DNSN-1 dimer with 
CMG (Fig. 8, A and B). 

Worms homozygous for the dnsn-IAN allele 
are viable, but DNA combing showed that 
dnsn-IAN produced a defect in DNA replica- 
tion initiation (Fig. 8C; dnsn-1AN increased the 
interfork distance). As seen above for cdc-7A 
(Fig. 5C), dnsn-1AN was synthetic lethal with 
RNAi specific to mus-10OI, tres-1, or dnsn-I (Fig. 
8D), and dnsn-IA was also lethal in combination 
with both mcm-10A (homozygotes were sterile; 
n = 16 animals) and cdc-7A (homozygotes were 
sterile; 2 = 16 animals), likely reflecting additive 
defects in DNA replication initiation. These 
findings indicate that the association of DNSN-1 
with GINS is important for initiation. 

In addition, we generated the dnsn-I-3A 
allele (fig. S9F) with mutations in three con- 
served residues in DNSN-1 that contact the 
AAA+ domain of MCM-3 [(iii) in Fig. 7C and | 
fig. SI5A]. When homozygous, we found that 
100% dnsn-I-3A worms were sterile (fig. SOF). 
To analyze the phenotype of dnsn-J-3A in the 
early embryo, we generated heterozygous 
mCherry—dnsn-1-3A/GFP—dnsn-1 worms, to- 
gether with mCherry—dnsn-1/GFP—dnsn-1 con- 
trol worms (Fig. 8E), and then treated them 
with RNAi to GFP and cdc-45 to deplete GFP- 
DNSN-1 and block CMG assembly, thereby 
leading to the accumulation of preinitiation 
complexes on condensed chromosomes, as 
described above (Fig. 3). Whereas wild-type 
mCherry-DNSN-1 accumulated on condensed 
chromosomes during S phase under such con- 
ditions (mCherry-DNSN-1 in Fig. 8F), the 
DNSN-1-3A mutant was present in the nucleus 
but was not detected on condensed chromatin 
(mCherry-DNSN-1-3A in Fig. 8F). These data 
indicate that the interaction of DNSN-1 with 
MCM-3 is essential in C. elegans and is re- 
quired for association of DNSN-1 with preini- 
tiation complexes, which assemble on loaded 
MCM-2-7 double hexamers in early S phase. 


Discussion 


Together with past work (40, 47), our data 
demonstrate that CMG helicase assembly 
during S phase in C. elegans requires TRES-1, 
MUS-101, SLD-2, and DNSN-1. Our data indi- 
cate that TRES-1 and MUS-101 are required to 
recruit CDC-45 to the preinitiation complexes 
that form on MCM-2-7 double hexamers (Fig. 
4), likely involving direct binding of the con- 
served Sld3/TRESLIN domain of TRES-1 (57) to 
CDC-45 (52). MUS-101 uses its BRCT1 and 
BRCT2 domains to bind to CDK-phosphorylated 
TRES-1 (47) and recruits DNSN-1 (fig. $4), which, 
together with SLD-2, is required to recruit 
GINS (Fig. 4 and fig. S8). Our data indicate 
that DNSN-1 binds simultaneously to GINS 
and to the MCM-3 subunit of MCM-2-7 double 
hexamers (Figs. 7 and 8), thereby positioning 
GINS in a precise fashion to promote CMG 
assembly. 
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Fig. 7. Cryo-EM analysis reveals A 
binding of dimeric DNSN-1 to 

the GINS and MCM-3 components 

of CMG. (A) Cryo-EM density map 
colored according to subunit. (B) Atomic 
model of the DNSN-1 homodimer. 
Neighboring components of the CMG 
helicase that are bound by DNSN-1 

are represented as transparent surfaces. 
(C) Full atomic model of the complex 
formed between DNSN-1 and the CMG 
helicase (associated with replisome 
components TIM-1 and TIPN-1, and fork 
DNA). The GINS (brown) and MCM-3 
(light blue) components of CMG that bind 
DNSN-1 are indicated. Inset images 
display zoomed-in views of the DNSN-1 
interaction sites with CMG: (i) the 
interface formed between one DNSN-1 
protomer, the amino-terminal domain 
(NTD) of MCM-3, and the GINS subunits 
SLD-5, PSF-2, and PSF-3; (ii) the 
interface formed between the DNSN-1 
amino terminus and the GINS subunits 
SLD-5 and PSF-2 (DNSN-1 is colored 

in alternating shades of green to 
indicate uncertainty as to which DNSN-1 
protomer this helix belongs because it 
is linked to the folded domain by 

~125 disordered residues); and (iii) 

the interface formed between DNSN-1 
and the MCM-3 AAA+ domain. For 
clarity, models are represented 

as cartoons or cartoons within 
transparent surfaces. 


Cryo-EM demonstrates that DNSN-1 is a 
dimer (Fig. 7). Almost all the observed contacts 
with GINS and MCM-3 can be assigned to a 
single DNSN-1 protomer (Fig. 7), suggesting 
that the DNSN-1 dimer might simultaneously 
contribute to the production of two CMG 
complexes during initiation. Symmetrical en- 
gagement of both DNSN-1 protomers with 
MCM-3 and GINS would likely require initial 
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disruption of the MCM-2-7 double hexamer, to 
allow rotation of the MCM-2-7 rings. 
Orthologs of DNSN-1/DONSON are present 
in other metazoan species and AlphaFold- 
Multimer predicts that human DONSON is a 
dimer that interacts with the AAA+ domain of 
MCM3, the SLD5 subunit of GINS, and the 
BRCT3 domain of TOPBP! (fig. S15). This sug- 


gests an evolutionarily conserved role for 
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PSF-2 SLD-5 


MCM-3 AAA+ 


metazoan DONSON orthologs during DNA 
replication initiation, consistent with recent data 
for Xenopus DONSON (53, 54). DONSON ortho- 
logs are also present in plants but are absent 
from budding yeast and many fungi, suggesting 
that the role of DNSN-1/DONSON in DNA repli- 
cation initiation emerged at an early step of 
eukaryotic evolution but was subsequently lost 
during fungal evolution. 
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Fig. 8. Interaction of DNSN-1 with GINS and MCM-3 is important for DNA 
replication initiation. (A) Recombinant wild-type DNSN-1 or alleles with small 
runcations in the amino-terminal binding site for GINS were mixed as shown 
with recombinant CMG and then incubated before isolation of CMG by 
immunoprecipitation on beads coated with anti-MCM-6 antibodies. The indicated 
proteins were monitored by immunoblotting. (B) Worms expressing mCherry- 
DNSN-1 (KAL256) or mCherry-DNSN-1A5-19 (dnsn-1AN, KAL257) from the 
endogenous dnsn-1 locus were fed on bacteria expressing npl-4 RNAi before 
preparation of embryonic cell extracts and immunoprecipitation on beads coated 
with an antibody that recognizes the mCherry tag. The indicated factors were 
monitored by immunoblotting. (C) Control worms (N2) and dnsn-IAN (KAL221) 
were analyzed by EdU incorporation and molecular combing of DNA fibers, as in 
Fig. 2F. The individual data points from three independent experiments are depicted 
ina violin plot. The median values are shown in red circles, and the red bars and 
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error bars represent the average of the median values and the associated standard 
deviations. The median values for each sample were then compared by using a 
paired two-tailed t test, yielding the indicated p values. (D) dnsn-1AN worms 
(KAL221) were fed on bacteria expressing the indicated RNAi or empty vector as 
negative control. The data represent the means and standard deviations from three 
biological replicates. (E) Embryos expressing GFP-dnsn-1/mCherry—dnsn-1 (progeny 
of KAL213 and KAL256) or GFP-dnsn-1/mCherry—dnsn-1-3A (progeny of KAL213 
and KAL303) were analyzed during S phase of the first cell cycle by time-lapse video 
microscopy. DNSN-1-3A has mutations in the binding site of DNSN-1 with the 
AAA+ domain of MCM-3 [see (iii) in Fig. 7C, fig. SOF, and table S5]. (F) Similar 
experiment to that in (E) after RNAi depletion of CDC-45 and GFP-DNSN-1. White 
arrows indicate the association of mCherry-DNSN-1 with preinitiation complexes that 
persist on condensed chromosomes in the absence of CDC-45. Red arrows indicate the 
lack of chromatin association for DNSN-1-3A. All scale bars correspond to 5 um. 


The ability of DNSN-1 to bind to CMG sug- 
gests that it might also have roles after helicase 
assembly. One possibility is that DNSN-1 con- 
tributes to the helicase activation step that in 
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yeast requires Mcm10, because worms lacking 
MCM-10 are viable but are highly sensitive to 
depletion of DNSN-1 (Fig. 5). A further pos- 
sibility would be that DNSN-1 is required to 
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maintain the integrity of CMG at replication 
forks, but we disfavor this idea for several 
reasons. For example, mutation or depletion 
of DNSN-1 does not impair fork progression 
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(Figs. 2F and 8C), recombinant CMG is in- 
herently stable in the absence of DNSN-1, and 
a reconstituted human replisome supports 
efficient DNA synthesis in the absence of 
DONSON (25). Our data do not exclude some 
other role for DNSN-1/DONSON at replication 
forks, such as a role in the repair of interstrand 
DNA cross-links, as reported previously for 
human DONSON (39). However, a recent study 
(55) has found that human DNA polymerase 
a-primase binds to CMG in a manner that 
would clash with DONSON binding (fig. S18), 
indicating that DONSON cannot remain bound 
to CMG at active forks in the same configu- 
ration that is observed by cryo-EM. 

Much remains to be learned in the future 
regarding the mechanism of metazoan DNA 
replication initiation, which clearly involves 
considerable additional complexity beyond the 
model established for budding yeast. Further 
insights in this area will inform our under- 
standing of how DNA replication initiation is 
deregulated during tumor development, as well 
as provide further mechanistic insight into how 
DNA replication initiation defects might under- 
pin the development of human disease syn- 
dromes that are associated with mutations 
in DONSON (33, 34) and the SLD-2 ortholog 
RECQL4: (56, 57). 


Materials and methods 
Experimental model and subject details 


The C. elegans strains used in this study were 
derived from the “Bristol N2” wild type and 
are described in table S5. Alleles generated with 
the CRISPR-Cas9 genome-editing system (InVivo 
Biosystems and Suny Biotech) were subsequent- 
ly out-crossed eight times with the N2 wild type. 

For expression of proteins in budding yeast, 
and as detailed in table S5, one of the three 
S. cerevisiae strains yJF1 (MATa ade2-1 ura3-1 
his3-11,15 trp1-1 lew2-3,112 canI-100 barIA:: 
hphNT pep4A::kanMX), YSS3 (MATa ade2-1 
ura3-1 his3-11,15 trpl-1 lew2-3,112 can1-100 
pep4A::ADE2), or YSS4 (MATa ade2-1 ura3-1 
his3-11,15 trp1-1 lew2-3,112 can1-100 pep4a:: 
ADE2) was transformed with the indicated 
linearized plasmids using standard procedures. 
The codon usage of the expression constructs 
was optimized for high-level expression in 
S. cerevisiae, as described previously (7). The 
codon optimized DNA sequences were synthe- 
sized by GenScript. 

For expression of proteins in bacteria, the 
plasmids listed in table S5 were transformed 
into the Escherichia coli strain Rosetta (DE3) 
pLysS (70956, Novagen). 


Method details 

C. elegans maintenance 

Worms were maintained according to standard 
procedures (58) and were grown on nematode 
growth medium (NGM) (3 g/liter NaCl, 2.5 g/ 
liter peptone, 20 g/liter agar, 5 mg/liter choles- 
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terol, 1 mM CaCl,, 1 mM MgSOu,, 2.7 g/liter 
KH»POu, 0.89 g/liter Ky HPO,). 


RNA interference 


RNAi was performed by feeding worms with 
bacteria containing plasmids that express dsRNA. 
RNAse II-deficient HT115 bacteria were trans- 
formed with an indicated L4440-derived plas- 
mid. For microscopy experiments, worms were 
fed on 6-cm plates containing the following 
medium: 3 g/liter NaCl, 20 g/liter agarose, 
5 mg/liter cholesterol, 1 mM CaClo, 1 mM 
MgSOu,, 2.7 g/liter KH,PO,, 0.89 g/liter K,HPO,, 
1 mM isopropyl-f-D-thiogalactopyranoside 
(IPTG), and 100 mg/liter ampicillin. For im- 
munoprecipitation experiments, worms were 
fed on 15-cm plates containing NGM medium 
supplemented with 1 mM IPTG and 100 mg/ 
liter ampicillin. 

The plasmids expressing dsRNA were 
either derived from a commercial RNAi library 
(SourceBioscience, UK; clsp-T) or made by clon- 
ing polymerase chain reaction (PCR) products 
into the vector L4440. In the latter case, we 
either amplified 1-kb products from cDNA 
(npl-4.2 isoform a, tim-1, dnsn-Ic, sld-2, tres-1, 
mus-101), amplified full-length cDNA for open 
reading frames of the cdc-45 gene, or amplified 
full-transcripts of GFP-tagged sequences from 
KAL213 using a cDNA library that was kindly 
provided by S.-L. Offenburger and G. Saredi. 
Details of sequences used in the RNAi vectors 
are provided in table S5. 

To target more than one gene simultane- 
ously by RNAi, as indicated above, we either 
fed a mixture of bacteria expressing the cor- 
responding dsRNA or cloned contiguous 1-kb 
fragments for each gene into a single 14440 
plasmid. Empty L4440 vector was used as the 
control for RNAi experiments throughout this 
study. 


Microscopy 


Worms at the larval L4 stage were incubated 
on 6-cm RNAi feeding plates for 48 hours at 
20°C. Adult worms were then dissected in M9 
medium (6 g/liter NagHPOu, 3 g/liter KH,POu,, 
5 g/liter NaCl, 0.25 g/liter MgSO,), and five 
embryos were transferred onto a 2% agarose 
pad and recorded simultaneously from the one- 
cell stage to four cells. To record embryos 
during early S phase, one or two embryos were 
transferred onto a thick 1% agarose pad and 
filmed until first mitosis, and the procedure 
was repeated to obtain five embryos. 

For the experiment in Fig. 8, E and F, we 
generated heterozygote gfp—dnsn-1/mCherry— 
dnsn-I-3A hermaphrodites to allow the pheno- 
type of mCherry—dnsn-I-3A to be examined 
after depletion of wild-type GFP-DNSN-1 by 
GFP RNAi because the mCherry—dnsn-I-3A 
allele is lethal when homozygous. As a con- 
trol, we performed similar experiments with 
heterozygote gfp—dnsn-1/mCherry—dnsn-1 
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hermaphrodites, in which both tagged versions 
of dnsn-1 lacked the 3A mutations in the bind- 
ing site of DNSN-1 with the AAA+ domain of 
MCM-3. To generate these strains, we crossed 
gfp—dnsn-1/gfp—dnsn-1 males to mCherry— 
dnsn-I-3A/dnsn-1 hermaphrodites or mCherry— 
dnsn-1/mCherry—dnsn-1 hermaphrodites. The 
resulting gfp—dnsn-1/mCherry—dnsn-1 and gfp— 
dnsn-1/mCherry—dnsn-1-3A heterozygote her- 
maphrodites were then treated with RNAi to 
cdc-45 + GFP or empty RNAi vector before 
dissection of embryos. Only the embryos ex- 
pressing mCherry-DNSN-1 (wild type or DNSN-1- 
3A) were analyzed. 

Time-lapse images were recorded at 24°C as 
described previously (35), with images taken 
every 10 s using a Zeiss Cell Observer SD mi- 
croscope with a Yokogawa CSU-X1 spinning 
disk and a HAMAMATSU C13440 camera fit- 
ted with a PECON incubator and a 60X/1.40 
Plan Apochromat oil immersion lens (Olympus). 
A single optical section (z-layer) was imaged 
for each time point. 

Images were captured using the ZEN blue 
software (Zeiss) and processed and analyzed 
with ImageJ software (National Institutes of 
Health). For each time-lapse experiment, the 
raw images were cropped, the intensity scale 
was adjusted similarly for all conditions, the 
“bit depth” was changed from 16 to 8 bits, and 
videos were assembled. For selected time points, 
images from videos were further cropped to 
focus on the “female” nucleus or nuclei, or on 
the entire embryo. Each series of images was 
then combined into a contiguous sequence, and 
the images were subjected to Gaussian Blur 
with a radius of 1 pixel. Subsequently, the pixel 
density was adjusted to 300 dots per inch. 

The signal intensity for mCherry-PSF-1 in 
fig. S8 was quantified using imageJ by cal- 
culating the integrated density of an area 
containing the metaphase plate and then sub- 
tracting the background (integrated density of 
another area of the embryo that was the same 
size but lacked chromatin). The mean value 
for five embryos was then determined, togeth- 
er with the standard deviation. 


Viability and synthetic lethality analysis 
in C. elegans 


For synthetic lethal analysis involving RNAi, 
RNAse II-deficient HT115 bacteria were trans- 
formed with an L4440-derived plasmid, corre- 
sponding to the required RNAi treatment. For 
the experiment in Fig. 2B, the bacterial cul- 
tures expressing dnsn-1 or gfp dsRNA or con- 
taining an empty plasmid were used to feed 
N2 or GFP—dnsn-1 worms. For the experiment 
in Fig. 5, the bacterial cultures expressing 
dnsn-l, sld-2, tres-1, or mus-101 dsRNA or con- 
taining an empty plasmid were used to feed 
N2, mcm-10A, or cdc-7A worms. For Fig. 8D, 
bacterial cultures expressing dnsn-I, sld-2, tres-1, 
or mus-101 dsRNA or containing an empty 
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plasmid were used to feed N2 or dnsn-IAN 
worms. All cultures were grown to optical den- 
sity at 600 nm (ODgoo) = 1, and worms were 
then incubated on RNAi feeding plates for 
48 hours at 20°C. For each condition, triplicate 
experiments were performed, in each of which 
five adult worms were allowed to produce em- 
bryos on a plate during a period of 180 min, 
after which the adults were removed and the 
embryos were counted. Two days later, the 
number of embryos that had developed into 
viable adults was determined (between 50 and 
150 embryos for each set of embryos from five 
worms). Embryonic viability was expressed as 
the ratio of the number of viable embryos to 
the total number of embryos, and the average 
and standard deviation were then determined 
for each triplicate set. 

Viability of gene deletions, or viability of 
CRISPR-generated mutations in dnsn-1, was 
monitored by counting the progeny of worms 
that were subsequently genotyped by PCR to 
confirm homozygosity. Note that the maternal 
contribution of a gene is often sufficient to 
support completion of embryogenesis, whereas 
the zygotic contribution is only required when 
the maternal contribution has run out, often 
after embryogenesis. An RNAi experiment de- 
pletes the maternal contribution, leading to 
embryonic lethality if the gene is essential and 
depletion is sufficient. 

By contrast, a null mutant of an essential 
gene (e.g., complete deletion of the coding 
sequence by CRISPR-Cas9) only removes the 
zygotic contribution to a homozygous embryo 
from a heterozygote parent. This can lead to 
larval death when the maternal contribution 
eventually runs out or to adult sterility, be- 
cause cell division in the adult is restricted to 
the germline. 


Protease inhibitor cocktails 


The following cocktails of protease inhibitors 
were used as indicated in the sections below: 

Cocktail 1: IX cocktail corresponded to one 
Roche EDTA-free protease inhibitor tablet 
(000000011873580001, Roche) per 25 ml of 
buffer (one tablet dissolved in 1 ml water 
makes a 25x stock solution) plus 1 ml of Sigma 
protease inhibitor cocktail (P8215, Sigma- 
Aldrich) per 100 ml of buffer. 

Cocktail 2: IX cocktail corresponded to one 
Roche EDTA-free protease inhibitor tablet 
(000000011873580001, Roche) per 25 ml of 
buffer (one tablet dissolved in 1 ml water makes 
a 25x stock solution) together with 0.5 mM 
phenylmethylsulfonyl fluoride (PMSF), 5 mM 
benzamidine HCl, 1 mM 4-(2-aminoethyl)ben- 
zenesulfonyl fluoride hydrochloride (AEBSF) 
(A84.56, Sigma-Aldrich), and 1 mg/ml pepsta- 
tin A (P5318, Sigma-Aldrich). 

Cocktail 3: IX cocktail corresponded to one 
Roche EDTA-free protease inhibitor tablet 
(000000011873580001, Roche) per 25 ml of 
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buffer (one tablet dissolved in 1 ml water 
makes a 25x stock solution) plus 0.5 mM PMSF. 


Extracts of worm embryos and 
immunoprecipitation of worm replisome 


RNAse II]-deficient HT115 bacteria were trans- 
formed with an L4440-derived plasmid, corre- 
sponding to the required RNAi treatment. A 
10-ml preculture was then grown overnight 
and used to inoculate a 450-ml culture in 
“Terrific Broth” (12 g/liter tryptone, 24 g/liter 
yeast extract, 9.4 g/liter K,HPO,, 2.2 g/liter 
KH,PO,, adjusted to pH 7). After 7 hours of 
growth in a baffled flask at 37°C with agitation, 
expression of dsRNA was induced overnight at 
20°C by addition of 3 mM IPTG. The bacteria 
were then pelleted and resuspended with one- 
fifth volume of 5xLCS buffer (M9 medium 
supplemented with 75 mg/liter cholesterol, 
100 mg/liter ampicillin, 50 mg/liter tetracycline, 
12.5 mg/liter amphotericin B, 3 mM IPTG). 

For each experiment, 0.7 ml of a synchron- 
ized population of L4 worms expressing TAP- 
PSF-1 (table S3), GFP-PSF-1 (Figs. 1, A and B, 
and 6A; fig. S2, A and B; and table S1), GFP- 
DNSN-1 (Fig. 1, B and C; fig. S2, C and D; and 
table S2), DNSN-1-GFP (Fig. 1B; fig. S2, C and 
D; and table S2), or mCherry-DNSN-1 (Fig. 
8B) were fed for 50 hours at 20°C on a 15 cm 
RNAi plate (see above) supplemented with 
10 g of bacterial pellet for the required RNAi 
treatment, prepared as described above. After 
feeding, the adult worms were washed in M9 
medium, resuspended for 2 min at room tem- 
perature in 14 ml of “bleaching solution” (for 
100 ml: 36.5 ml HO, 45.5 ml 2 M NaOH, and 
7 ml NaClO 10%), and then pelleted for 1 min 
at 300 g. This bleaching procedure was re- 
peated two more times, corresponding to a 
total of 12 min in bleaching solution, to lyse 
the adult worms and release embryos (about 
0.6 to 0.8 g). After bleaching, the embryos 
were washed twice with M9 medium. 

The remaining steps were performed at 4°C 
and are previously described methods (35). 
Embryos were washed twice with lysis buffer 
(100 mM HEPES-KOH pH 7.9, 100 mM potas- 
sium acetate, 10 mM magnesium acetate, 2 mM 
EDTA, 0.02% IGEPAL CA-630, 10% glycerol) 
and then resuspended with three volumes of 
lysis buffer that was supplemented with 2 mM 
sodium fluoride, 2mM sodium -glycerophosphate 
pentahydrate, 1 mM dithiothreitol (DTT), 1X 
protease inhibitor cocktail 1, and 5 uM pro- 
pargylated ubiquitin (Ub-PrG) to inhibit deu- 
biquitylase enzymes (kindly provided by 
A. Knebel and C. Johnson; DU49003, MRC 
PPU reagents and services). The mixture was 
transferred dropwise into liquid nitrogen to 
prepare “popcorn,” which was stored at —80°C. 
We then ground ~2.5 g of the frozen popcorn 
in a SPEX SamplePrep 6780 Freezer/Mill. After 
thawing, we added one-quarter volume of lysis 
buffer (with additional 1 mM DTT, 2 mM so- 
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dium fluoride, 2 mM sodium £-glycerophosphate 
pentahydrate, and 1X protease inhibitor cock- 
tail 1). Chromosomal DNA was digested with 
1600 U of universal nuclease (Pierce, 88702, 
Thermo Fisher Scientific) for 30 min at 4°C. 
Extracts were centrifuged at 25,000g for 30 min 
and then at 100,000g for 1 hour. Fifty micro- 
liters of extract was added to 100 ul of 15 X 
Laemmli buffer and stored at —80°C. The re- 
maining ~2 ml of extract was then incubated 
for 90 min with 40 ul of a slurry of GFP-Trap 
Magnetic Particles M-270 (gtd, Chromotek), 
40 ul of a slurry of RFP-Trap Magnetic Particles 
M-270 (rtdk, Chromotek), or 200 ul of a slurry 
of magnetic beads (Dynabeads M-270 Epoxy; 
14302D, Thermo Fisher Scientific) coupled to 
rabbit immunoglobulin G (S1265, Sigma-Aldrich) 
as described below. The beads were washed 
four times with 1 ml of wash buffer (lysis buffer 
supplemented with1mM DTT,2mM sodium | 
fluoride, 2 mM sodium f-glycerophosphate 
pentahydrate, and 1X protease inhibitor cock- 
tail 1), and the bound proteins were eluted at 
95°C for 5 min in 100 ul of LX Laemmli buffer 
(or 50 ul when used for mass spectrometry 
analysis) and stored at —-80°C. 


Mass spectrometry 


Samples were purified from worm embryos 
as above and eluted in 50 ul of Laemmli 
buffer, of which 30 ul was resolved by SDS- 
polyacrylamide gel electrophoresis (SDS-PAGE) 
using NuPAGE Novex 4-12% Midi Bis-Tris gels 
(NP0321, Life Technologies) with NuPAGE 
MOPS SDS buffer (NP000102, Life Technolo- 
gies). Subsequently, gels were stained with 
SimplyBlue SafeStrain colloidal Coomassie 
(LC6060, Invitrogen), and each lane was cut 
into 40 slices that were digested with trypsin 
before processing for mass spectrometry (MS 
Bioworks). Data were analyzed using Scaffold 
software (Proteome Software Inc.). 


Purification of C. elegans proteins 


Proteins purified in this study are listed in - 


table S5. TEV protease (DU6811, MRC PPU Re- 
agents and Services) and PreScission (DU34905, 
MRC PPU Reagents and Services) protease 
were kindly provided by A. Knebel. The pro- 
teins were produced as described in the follow- 
ing sections. The other proteins were produced 
as described in the following sections, using the 
buffer below: buffer A: 25 mM Hepes KOH pH 
7.6, 10% glycerol, 0.02% IGEPAL CA-630, 
1 mM tris(2-carboxyethyl)phosphine (TCEP). 


Expression of proteins in budding yeast 


The S. cerevisiae strains used in this study are 
shown in table S5. Yeast cells were grown at 
25°C in YP medium (1% yeast extract, 21275, 
Becton Dickinson; 2% bacteriological peptone, 
LP0037B, Oxoid) supplemented with 2% raf- 
finose. In each case, a 12-liter exponential cul- 
ture was grown to 2 x 10’ to 3 x 10’ cells/ml 
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and then induced for 6 hours at 20°C by ad- 
dition of galactose to a final concentration of 
2%. Cells were collected by centrifugation and 
washed once with lysis buffer (indicated below 
for each purification) lacking protease inhibi- 
tors. Cell pellets (~30 g) were then resuspended 
in 0.3 volumes of the indicated lysis buffer con- 
taining protease inhibitors. The resulting sus- 
pensions were then frozen dropwise in liquid 
nitrogen and stored at —80°C. Subsequently, the 
entire sample of frozen yeast cells was ground 
in the presence of liquid nitrogen, using a SPEX 
CertiPrep 6850 Freezer/Mill with three cycles 
of 2 min at a rate of 15 cycles per second. The 
resulting powders were then stored at —80°C. 


CMG helicase 


Purification of CMG was modified as previ- 
ously described (35). Yeast cell powder was 
thawed in buffer A/0.2 M KCl/2 mM Mg(OAc),/ 
1X protease inhibitor cocktail 2. Universal 
nuclease (Pierce, 88702, Thermo Fisher Scien- 
tific) was then added to 250 U/ml to the whole 
cell extract, and the sample was incubated at 
4°C for 30 min with rotation. The mixture was 
centrifuged at 100,000g at 4°C for 30 min, 
followed by another step of centrifugation at 
235,000g at 4°C for 1 hour. After spinning, the 
soluble extract was recovered and mixed with 
2 ml immunoglobulin G (IgG) resin (17096901, 
GE). The mixture was incubated at 4°C for 
2 hours with rotation. 

Resin was collected and washed extensively 
with buffer A/0.2 M KCl/2 mM Mg(OAc)./1X 
protease inhibitor cocktail 2. The resin was 
then incubated with 10 ml of buffer A/0.2 M 
KCl/10 mM Mg(OAc),/1X protease inhibitor 
cocktail 2/2 mM ATP at 4°C for 10 min to re- 
move chaperones and then washed extensive- 
ly with buffer A/0.2 M KCl/2 mM Mg(OAc)p. 
The purified proteins were then eluted by over- 
night incubation with rotation in 2 ml buffer A/ 
0.2 M KCl/2 mM Mg(OAc), containing 100 pg 
of PreScission protease. 

The supernatant was collected, and the re- 
sin was further eluted twice with 2 ml of buffer 
A/0.2 M KCl/2 mM Mg(OAc)». The pooled 
eluate was diluted to 10 ml and loaded onto a 
1-m]l HiTrap Q column in buffer A/0.2 M KCl/ 
2mM Mg(OAc)s. CMG was eluted with a 30-ml 
gradient from 0.2 to 1M KCl in buffer E/2 mM 
Mg(OAc)s. The peak fractions were concen- 
trated and loaded onto a 24-ml Superose 6 
column in buffer A/0.2 M KOAc/2 mM Mg 
(OAc)s. Peak fractions containing CMG were 
pooled, aliquoted, and snap frozen in liquid 
nitrogen and stored at -80°C. 


GINS 

Yeast cell powder was thawed in buffer A/0.5 M 
NacCl/IX protease inhibitor cocktail 2. Universal 
nuclease (Pierce, 88702, Thermo Fisher Scien- 
tific) was then added to 250 U/ml to the whole 
cell extract, and the sample was incubated at 
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4°C for 30 min with rotation. The mixture was 
centrifuged at 100,000g at 4°C for 30 min, fol- 
lowed by another step of centrifugation at 
235,000g at 4°C for 1 hour. After spinning, 
the soluble extract was recovered and mixed 
with 2 ml of IgG resin (17096901, GE). The 
mixture was incubated at 4°C for 2 hours with 
rotation. 

Resin was collected and washed extensively 
with buffer A/0.5 M NaCl/1X protease inhib- 
itor cocktail 2. The resin was then incubated 
with 10 ml of buffer A/0.5 M NaCl/10 mM Mg 
(OAc)./1X protease inhibitor cocktail 2/2 mM 
ATP at 4°C for 10 min to remove chaperones 
and then washed extensively with buffer A/ 
0.5 M NaCl. The purified proteins were then 
eluted by overnight incubation with rotation 
in 2 ml buffer A/0.5 M NaCl containing 100 ug 
of PreScission protease. 

The supernatant was collected, and the re- 
sin was further eluted twice with 2 ml of buffer 
A/0.5 M NaCl. The pooled eluate was concen- 
trated and loaded onto a 24-ml Superdex 200 
column in buffer A/0.5 M NaCl. Peak fractions 
containing CMG were pooled, aliquoted, snap 
frozen in liquid nitrogen, and stored at —80°C. 


TIM-1_TIPN-1 


Purification of TIM-1_TIPN-1 was carried out 
as previously described (59). Briefly, yeast cell 
powder was thawed in buffer A/0.2 M NaCl/ 
1X protease inhibitor cocktail 2. After centrif- 
ugation, the soluble yeast extract was mixed 
with calmodulin affinity resin (17052901, GE), 
and TIM-1_TIPN-1 was subsequently eluted in 
buffer A/0.2 M NaCl/2mM EDTA/2mM EGTA. 
The purified proteins were then eluted with 
the addition of 100 ug of PreScission protease 
to the eluate fraction, followed by overnight 
incubation with rotation at 4°C. The purified 
sample was concentrated and loaded onto a 
24-ml Superdex 200 column in buffer A/0.3 M 
KOAc. The peak fractions were then pooled, 
concentrated, and reloaded onto a 24-ml 
Superdex 200 column in buffer A/0.3 M KOAc. 
Finally, the peak fractions were pooled, con- 
centrated, aliquoted, and snap frozen in liquid 
nitrogen before storage at —80°C. 


Expression of proteins in bacterial cells 


The plasmids for bacterial expression used in 
this study are shown in table S5. Each plasmid 
was transformed into Rosetta (DE3) pLysS 
(70956, Novagen), which was grown in LB 
medium supplemented with 50 ug/ml ampi- 
cillin (pETI5b based plasmids) or 50 ug/ml 
kanamycin (pK27SUMO based plasmids). Sub- 
sequently, a 10-ml culture was grown over- 
night at 37°C with shaking at 200 rpm. The 
following morning, the culture was diluted 
50-fold into 500 ml of selective medium and 
then left to grow at 37°C until an ODgoo of 
1 was reached. At this point, 1 mM IPTG was 
added and expression was induced overnight 
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at 18°C. Cells were harvested by centrifugation 
for 10 min in a JLA-9.1000 rotor (Beckman) at 
5000 rpm. The cell pellets were then stored at 
-80°C. 


DNSN-1 (full-length and truncated proteins) 


Pellets were resuspended in 20 ml of buffer 
A/0.5 M NaCl/20 mM imidazole/1X protease 
inhibitor cocktail 3 with 500 ug/ml Lysozyme 
and then the mixture was incubated at 4°C 
for 30 min with rotation. Subsequently, the 
sample was sonicated twice for 90 s (15 s on, 
30 s off) at 40% on a Branson Digital Sonifier. 
The mixture was centrifuged at 100,000g at 4°C 
for 30 min. After spinning, the soluble extract 
was recovered and mixed with 1 ml Ni-NTA 
resin (30210, QIAGEN). The mixture was incu- 
bated at 4°C for 2 hours with rotation. 

Resin was collected and washed extensively 
with buffer A/0.5 M NaCl/20 mM imidazole/ 
1X protease inhibitor cocktail 3. The resin was 
then incubated with 10 ml of buffer A/0.5 M 
NaCl/20 mM imidazole/1X protease inhibitor 
cocktail 3/10 mM Mg(OAc)./2 mM ATP at 4°C 
for 10 min to remove chaperones and then 
washed extensively with buffer A/0.5 M NaCl/ 
20 mM imidazole. Proteins were eluted with 
5 ml of buffer A/0.5 M NaCl/250 mM imid- 
azole. Ulp1 protease (10 ug/ml) was added to 
cleave the His-SUMO tag from UBC-12, and 
the mixture was incubated for 1 hour on ice. 

The sample was concentrated and loaded 
onto a 24-ml Superdex 200 column in buffer 
A/0.5 M NaCl. The peak fractions were pooled, 
concentrated, aliquoted, snap frozen in liquid 
nitrogen, and stored at —80°C. 


Isolation of C. elegans embryonic cells 


RNAse I]-deficient HT115 bacteria were trans- 
formed with an 14440-derived plasmid, cor- 
responding to the required RNAi treatment. A 
20-ml culture was then grown in Terrific Broth. 
After 7 hours of growth in a baffled flask at 37°C 
with agitation, expression of dsRNA was induced 
overnight at 20°C by addition of 3 mM IPTG. The 
bacteria were then pelleted and resuspended 
with one-fifth volume of 5X LCS buffer. 

For each experiment, 10 ul of a synchronized 
population of L4 worms were fed for 48 hours 
at 20°C on a 6-cm NGM plate, supplemented 
with bacterial pellet for the required RNAi 
treatment, prepared as described above. After 
feeding, the adult worms were washed in M9 
medium, resuspended for 10 min at room tem- 
perature in 1 ml of bleaching solution, and then 
pelleted for 1 min at 300g. Release embryos 
were washed twice with M9 medium and re- 
suspended in 1 ml of Dulbecco’s Modified Eagle 
Medium (DMEM) (11960044, Thermo Fisher 
Scientific), supplemented with 10% fetal bovine 
serum (FBS, FCSSA/500, LabTech). To remove 
the eggshell of the embryos, 16 ul of Chitinase 
(25 U/ml, C6137-25UN, Sigma-Aldrich) was ad- 
ded and gently rotated for 45 min at room 
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temperature. Cells were dissociated by gently 
pipetting for 3 min. The remaining embryos 
and larvae were removed by a 4-um filter. 


EdU cell proliferation assay 


Embryonic cells were isolated as described 
above. Cells were incubated with 20 uM EdU 
for 20 min at room temperature with gentle 
rotation. Cells were pelleted for 1 min at 1000g 
and 4°C and then washed in 1 ml of cold 
phosphate-buffered saline (PBS) and resus- 
pended. Cells were pelleted for 1 min at 1000g 
and 4°C and then resuspended in 20 ul of 
cold PBS. Five microliters of cell slurry was 
transferred onto a poly-lysine coated slide, 
covered by a coverslip, and then the slide was 
quickly frozen on dry ice. The coverslip was 
removed before the fixation of the slide in 
cold methanol. The slides were fixed in meth- 
anol overnight at —20°C. 

The slides were thawed in PBS for 10 min 
at room temperature, and additional PBS was 
removed from the slides. The incorporated 
EdU was then labeled using the Click-iT Plus 
Alexa Fluor 647 Picolyl Azide Toolkit (C10643, 
Invitrogen), according to the manufacturer’s 
instructions. Nuclear DNA was stained with 
5 ug/ml Hoechst 33342 (H1399, Invitrogen) 
for 10 min at room temperature. The slides 
were washed twice with PBS/T (PBS supple- 
mented with 0.1% Tween) followed by PBS for 
5 min each time at room temperature. The 
slides were air dried, mounted in 90% glycerol 
in PBS, and sealed. Microscopy was performed 
using a Zeiss Cell Observer SD microscope 
with a Yokogawa CSU-X1 spinning disk and 
a HAMAMATSU C13440 camera, fitted with a 
PECON incubator. Images were captured using 
the ZEN blue software (Zeiss) and analyzed 
with FIJI software (National Institutes of 
Health). More than 100 Hoechst-positive cells 
were detected. The EdU-positive population 
was calculated by dividing the number of cells 
positive for EdU-Alexa Fluor647 by the total 
number of Hoechst-positive cells. The average 
and standard deviation were then determined 
for each triplicate set. 


Molecular combing of DNA fibers 


Embryonic cells were isolated as described 
above. Cells were incubated with 200 uM EdU 
for 30 min at room temperature with gentle 
rotation. Cells were pelleted for 1 min at 1000g 
and 4°C and then washed in 1 ml of cold PBS 
and resuspended. Cells were pelleted for 1 min 
at 1000g and 4°C and then resuspended in 
90 ul of cold PBS. Combing was performed with 
the FiberPrep DNA Extraction Kit (EXT-001A, 
Genomics Vision). A 90-ul aliquot of the cell 
slurry was used to prepare two agarose plugs 
and subjected to Proteinase K digestion over- 
night at 60°C. The following day, the plugs 
were washed three times with a kit wash buffer 
for 1 hour per wash followed by an additional 
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wash step for 3 hours. The two plugs from the 
same sample were then transferred to 2-ml 
round-bottomed tubes (HP37094D, Eppendorf) 
and stained by YOYO-1 in kit staining buffer 
for 1 hour at room temperature. Then plugs 
were melted at 68°C for 20 min before sub- 
sequent equilibration at 42°C for 10 min, after 
which B-agarase was added and the samples 
were incubated for 14 hours at 42°C in the 
dark. The following day, DNA fibers were 
combed onto silanized CombiCoverslips cover- 
slips (COV-002-RUO, Genomics Vision) with 
FiberComb Molecular Combing System (MCS- 
001, Genomics Vision). The combed coverslips 
were heated at 60°C for 2 hours. 

For detection of labeled DNA, the coverslips 
were dehydrated consecutively in ethanol at 
70, 90, and 100% for 1 min each. Air-dried 
coverslips were blocked by block-aid (B10710, 
Invitrogen) at 37°C for 30 min. The incorpo- 
rated EdU was then labeled using the Click-iT 
Plus Alexa Fluor 647 Picolyl Azide Toolkit 
(C10643, Invitrogen), according to the manu- 
facturer’s instructions. The coverslips were 
washed three times by PBS/T for 5 min each 
time at room temperature; dehydrated in 
ethanol 70, 90, and 100%; air dried; mounted 
in 90% glycerol in PBS; and sealed. Micros- 
copy was performed using a Zeiss Cell Ob- 
server SD microscope with a Yokogawa CSU-X1 
spinning disk and a HAMAMATSU C13440 
camera, fitted with a PECON incubator. Images 
were captured using the ZEN blue software 
(Zeiss) and analyzed with FIJI software (Na- 
tional Institutes of Health). Fork progression 
was defined as the length of EdU labeled tracks, 
and 200 EdU-labeled DNA fibers were mea- 
sured per sample. The interfork distance is 
defined as the distance between two EdU 
labeled tracks on the same fiber; 200 interfork 
distances were measured per sample. The 
average of the medians of each experiment 
and standard deviation were then determined 
for each triplicate set. 


Immunoprecipitation of reconstituted complexes 


Reactions (typically 10 ul in volume) contain- 
ing 25 mM Hepes-KOH (pH 7.6), 0.02% IGEPAL 
CA-630, 0.1 mg/ml bovine serum albumin 
(BSA), 1 mM DTT, 100 mM KOAc, 10 mM Mg 
(OAc)s, 0.5 mM Adenylyl imidodiphosphate 
(AMP-PNP), 50 nM leading DNA substrate 
(comprising 46-base pair dsDNA and a 39- 
nucleotide “3’-flap” of ssDNA; the sequences 
are shown in table S5), and 3.3 ul of protein 
mix were assembled on ice for 30 min. The 
protein mix contained 300 mM KOAc, so the 
final KOAc concentration of the reactions was 
200 mM. Each sample was then incubated at 
4°C with 5 ul of magnetic beads (Dynabeads 
M-270 Epoxy; 14302D, Thermo Fisher Scien- 
tific) that had been coupled to anti-SLD-5 
antibodies as described below. After 1 hour, 
protein complexes bound to the magnetic beads 
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were washed twice with 1 ml of buffer con- 
taining 25 mM Hepes-KOH (pH 7.6), 0.02% 
IGEPAL CA-630, 0.1 mg/ml BSA, 1 mM DTT, 
10 mM Mg(OAc)s, and 200 mM KOAc. The 
bound proteins were eluted at 95°C for 5 min 
in 30 ul of 1X Laemmli buffer. 

For the experiment in Fig. 6D, a 10-ul vol- 
ume with the indicated components corre- 
sponding to 15 nM CMG, 15 nM GINS, and 
30 nM DNSN-1 (as a dimer) was used. For the 
experiment in Fig. 8A, a 10-1] volume with the 
indicated components corresponding to 15 nM 
CMG and 50 nM DNSN-1 variants (as a dimer) 
was used. 


Glycerol gradient analysis 


Protein mixtures were prepared in 25 mM 
Hepes-KOH (pH 7.6), 200 mM KOAc, 0.02% 
IGEPAL CA-630, 1 mM DTT, 10 mM Mg(OAc)s, 
0.5 mM AMP-PNP, and 500 nM leading DNA . 
substrate and then incubated on ice for 1 hour. 
To assemble glycerol gradients, five different 
concentrations of glycerol buffers were used 
(10, 15, 20, 25, and 30%), each containing 
25 mM Hepes-KOH (pH 7.6), 200 mM KOAc, 
0.02% IGEPAL CA-630, 1 mM DTT, and 10 mM 
Mg(OAc)». Gradients were assembled by con- 
secutively layering 40 ul of each of the five 
concentrations of glycerol buffers (30 to 10%) 
in an ultracentrifuge tube (P200915MGSG, 
Beckman). Subsequently, 5 ul of the protein 
mixture was added to the top of the gradient 
before spinning for 1 hour at 55,000 rpm 
(249,000g in a Beckman TLS55 rotor) at 4°C. 
Ten fractions of 20 1] each were then collected 
from the top to the bottom of the gradient. 
After addition of 10 ul of 3X Laemmli buffer, 
the samples were analyzed by SDS-PAGE and 
immunoblotting. 


Immunoblotting 


Protein samples were resolved by SDS-PAGE 
using the following systems: NuPAGE Novex 
4-12% Bis-Tris gels (NP0321 and WG1402A, 
Thermo Fisher Scientific) with NuPAGE MOPS 
SDS buffer (NP0001, Thermo Fisher Scientific) 
or NuPAGE MES SDS buffer (NP0002, Thermo 
Fisher Scientific); and NuPAGE Novex 3-8% 
Tris-Acetate gels (EA0375BOX and WG1602BOX, 
Thermo Fisher Scientific) with NuPAGE Tris- 
Acetate SDS buffer (LA0041, Thermo Fisher 
Scientific). The resolved proteins were either 
stained with colloidal Coomassie blue dye 
(nstantBlue, ab119211, Abcam) or were trans- 
ferred onto a nitrocellulose iBlot2 membrane 
(2NR290123-01, Thermo Fisher Scientific) 
with the iBlot2 Dry Transfer System (IB21001, 
Invitrogen), according to the manufacturer’s 
instructions. 

The antibodies used for immunoblotting in 
this study are described in table S5. Chemo- 
luminescent signals were detected by azure- 
biosystems 300Q with ECL Western Blotting 
Detection Reagent (17039552, GE Healthcare). 
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Preparation of antibody-coated 

magnetic beads 

Aslurry of activated magnetic beads (Dynabeads 
M-270 Epoxy; 14302D, Thermo Fisher Scientific) 
was prepared by resuspending 300 mg of beads 
in 10 ml of dimethyl formamide. Each coupling 
reaction involved 425 ul of a slurry of activated 
magnetic beads, which corresponded to ~1.4 x 
10° beads. After removing the supernatant, the 
beads were washed twice with 1 ml of 0.1M 
NaPO; pH 7.4. Subsequently, the beads were 
incubated with 300 ug of rabbit IgG (S1265, 
Sigma-Aldrich), C. elegans MCM-6 antibody 
(SA417, MRC PPU Reagents and Services), or 
C. elegans SLD-5 antibody (SA419, MRC PPU Re- 
agents and Services), 300 ul of 3M (NH4).SO,, 
plus 0.1M NaPOs pH 7.4 up to a total volume of 
900 ul. The mixture was then incubated at 4°C 
for 2 days with rotation. 

Subsequently, the supernatant was removed, 
and the beads were washed four times with 
1 ml of PBS. The beads were then incubated for 
10 min in 1 ml of PBS/0.5% IGEPAL CA-630 
with rotation at room temperature, before 
washing twice with 1 ml of PBS. Finally, the 
washed beads were resuspended with 900 ul 
of PBS containing 5 mg/ml BSA. 


Yeast two-hybrid assays 


Two-hybrid analysis based on the Gal4 trans- 
cription factor was performed by cotransforma- 
tion of derivatives of pGADT7 (630442, Takara; 
Gal4 activation domain; LEU2 marker) and 
PGBKT7 (630443, Takara; Gal4: DNA binding 
domain; TRP1 marker) into the yeast strain 
PJ69-4A. For transformation and selection, 
synthetic complete dropout medium (SC me- 
dium) was used with the required supplements. 
For each assay, five independent transformed 
colonies were mixed together in dH,O and 
used to make serial dilutions, before spotting 
10-fold dilutions from 50,000 to 50 cells onto 
SC medium lacking tryptophan and leucine 
(selective for pGADT7 and pGBKT7 but non- 
selective for the two-hybrid interaction) or SC 
medium lacking tryptophan, leucine, histidine 
(selective for the two-hybrid interaction). 


Cryo-EM sample preparation 


The DNA substrate was annealed by mixing 
equal volumes of the leading strand template 
('- [Cy3JTAGAGTAGGAAGTGA[iBiodT]GGTA- 
AGTGATTAGAGAATTGGAGAGTGTGTTTT- 
TITTTTTTTTTTTTTTTTTTTTTTTTTTTTI- 
T*T*T*T*T*T; *=phosphorothioate linkage) and 
the lagging strand template (5’'- GGCAGGCA- 
GGCAGGCACACACTCTCCAATTCTCTAATCA- 
CTTACCA[iBiodT]CACTTCCTACTCTA), each at 
53 uM in 25 mM HEPES-NaOH, pH 7.5, 150 mM 
NaOAc, 0.5 mM TCEP, and 2 mM Mg(OAc). 
before cooling gradually from approximately 
90°C to room temperature. 

CMG was mixed with 1.3-fold molar excess 
DNA in reconstitution buffer (25 mM HEPES- 
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NaOH, pH 7.5, 100 mM NaOAc, 0.5 mM TCEP, 
10 mM Mg(OAc)s, 0.6 mM AMP-PNP) in a 
100-1 reaction volume and incubated on ice 
for 30 min. To this was added a mixture of 
TIM-1/TIPN-1 and DNSN-1 isoform c (each at 
1.3-fold molar excess over CMG) in reconstitu- 
tion buffer, giving a final reaction volume of 
250 ul and resulting in a final CMG concen- 
tration of 125 nM. The reaction was incubated 
on ice for a further 30 min before loading 100 ul 
onto each of two GraFix gradients (60) (with 
the remainder applied to a gradient with cross- 
linking agents omitted). Glycerol gradients 
were prepared using a modified form of a pre- 
viously described protocol (50) by layering 
equal volumes of a 10% glycerol buffer [40 mM 
HEPES-NaOH, pH 7.5, 100 mM NaOAc, 0.5 mM 
TCEP, 0.5 mM AMP-PNP, 10 mM Mg(OAc)», 
10% v/v glycerol] on top of a 30% glycerol 
buffer [40 mM HEPES-NaOH, pH 7.5, 100 mM 
NaOAc, 0.5 mM TCEP, 0.5 mM AMP-PNP, 
10 mM Mg(OAc)s, 30% v/v glycerol, 0.22% 
glutaraldehyde, 2 mM bis(sulfosuccinimidyl) 
suberate] in a 2.2-ml TLS-55 centrifuge tube 
(Beranek Laborgerate). Gradients were pre- 
pared using a gradient-making station (Bio- 
comp Instruments, Ltd.) before cooling on ice. 
Gradient sedimentation was performed by 
centrifugation using a Beckman TLS-55 rotor 
(200,000g, 2 hours, 4°C) and 100-ul fractions 
were manually collected. After analysis of each 
fraction by silver-stained SDS-PAGE, peak 
fractions (fractions 7 to 9 in fig. S11A) were 
pooled across both cross-linking gradients 
(total volume ~550 ul) and buffer-exchanged 
in buffer containing 25 mM HEPES-NaOH, 
pH 7.5, 150 mM NaOAc, 0.5 mM TCEP, 2 mM 
Mg(OAc)., 0.1 mM AMP-PNP, 0.005% v/v 
TWEEN 20 (Sigma, catalog no. P8341) during 
six rounds of ultrafiltration in 0.5 ml of 30K 
MWCO centrifugal filters (Amicon) using a 
bench-top centrifuge (21,000g, 4°C, 1 min per 
round). The sample was concentrated to ~27 ul 
and immediately used for cryo-EM grid pre- 
paration using Quantifoil R2/2 copper 400-mesh 
grids commercially coated with a ~2-nm-thick 
ultrathin continuous carbon support and glow 
discharged using a PELCO easiGlow glow- 
discharge cleaning system (15 mA, 5 s) before 
vitrification in liquid-nitrogen-cooled ethane 
using a cold-room situated manual plunger. 


Cryo-EM data collection 


A total of 10,825 raw micrographs were acquired 
in a single dataset using a 300-keV Titan Krios 
microscope (FEI) equipped with a K3 direct 
electron detector (Gatan) operated in electron- 
counting mode using the EPU automated ac- 
quisition software (Thermo Fisher Scientific) 
with “faster acquisition” mode (AFIS) en- 
abled. A slit width of 20 eV was used for the 
BioQuantum energy filter. Data were collected 
in super-resolution mode and with a binning 
factor of two, yielding an effective pixel size 
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of 1.09 A/pixel (nominal magnification of 
81,000~x), using a defocus range of -1.0 to 
-2.5 um and dose-fractionating into 41 frac- 
tions per micrograph. An exposure time of 1.7s 
achieved a dose of 40.1 e /A” per micrograph. 


Cryo-EM data processing 


RELION-4.0 (67) was used for data process- 
ing. The 41-fraction movies were aligned and 
dose-weighted (0.977 e /A?/fraction, five-by- 
five patches, 300 A” B-factor) using RELION’s 
implementation of a MotionCor2-like program 
(62). CTF parameters were estimated using 
CTFFIND-4.1 (63). Particles were picked in a 
template-free manner using Gautomatch v0.56 
(https://zhanglab.yale.edu/programs), leading 
to the extraction of ~4,540,000 particles using 
a box size of 410 A. During extraction, data 
were down-sampled to a pixel size of 4.36 A/ 
pixel, and the dataset was divided into three 
roughly equal parts (parts A to C), which were 
subsequently processed independently up until 
completion of the first 3D classification with- 
out alignment to ease computational demand 
(refer to fig. S12). The dataset was subjected to 
two iterative rounds of 2D classification (reg- 
ularization parameter, T = 2), yielding a total 
of ~3,140,000 particles in well-aligned classes. 
These particles were submitted for a single 
round of 3D classification with alignment (T = 
4), yielding well-aligned classes containing a 
total of ~3,040,000 particles representing the 
best-quality particles from our dataset; these 
classes each contained CMG helicase plus a 
heterogeneous occupancy of TIM-1/TIPN-1 and 
DNSN-1. These particles were subsequently 
reextracted at a pixel size of 1.50 A/pixel. After 
using 10,000 particles from one-third of the 
dataset to determine optimized parameters, 
each third of the dataset was submitted for 
per-particle motion correction using Bayesian 
polishing (63); during this process, particles 
were reextracted at a pixel size of 1.30 A/pixel. 
The resulting particles were submitted for ite- 
rative beam-tilt and trefoil correction, aniso- 
tropic magnification correction, per-particle 
defocus correction, and per-micrograph astig- 
matism correction. These particles are hence- 
forth referred to as the “polished dataset.” 

To enrich the polished dataset for particles 
containing DNSN-1, signal subtraction was used 
to focus on the region of the map encompass- 
ing the homodimer of DNSN-1 folded domains 
plus neighboring regions of MCM3 and GINS 
that form the interface with the folded domain 
of DNSN-1. Subsequent 3D subclassification 
without alignment (T = 10) yielded a total of 
~170,000 particles with good DNSN-1 occu- 
pancy. After 3D refinement and map sharp- 
ening, these particles yielded a reconstruction 
of the CMG/DNSN-1 complex at a resolution of 
3.25 A (all resolutions henceforth quoted at 
the FSC = 0.143 criterion following 3D refine- 
ment and map sharpening with B-factors in 
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the range —15 to —20 A’). To further improve 
this reconstruction, multibody refinement was 
used to define the following four rigid bodies: 
MCM-2-7 NTDs/TIM-1/TIPN-1/dsDNA (3.20-A 
resolution), MCM-2-7 AAA+/ssDNA (3.17-A 
resolution), CDC-45/GINS (2.88-A resolution), 
and DNSN-1 dimer (3.44-A resolution). These 
multibody-derived maps were used for build- 
ing models of MCM-2-7, ssDNA, CDC-45, GINS, 
and DNSN-1. 

To isolate particles containing CMG in com- 
plex with both DNSN-1 and TIM-1/TIPN-1, the 
above CMG/DNSN-1 particles were further 3D 
subclassified without alignment (T = 10) after 
using signal subtraction to focus on the region 
encompassing TIM-1/TIPN-1/dsDNA. This ap- 
proach produced a class containing ~33,900 
particles, representing CMG/TIM-1/TIPN-1/ 
DNSN-1 complexes (3.75-A resolution). This 
reconstruction confirmed that DNSN-1 and 
TIM-1/TIPN-1 can simultaneously associate 
with CMG, and each does not influence the 
mode of interaction of the other. 

To improve the resolution of the TIM-1/ 
TIPN-1 region of the complex, a comparable 
signal subtraction and 3D subclassification 
without alignment was performed, again fo- 
cusing on TIM-1/TIPN-1/dsDNA, this time 
using the complete Polished Dataset as the in- 
put. This approach yielded a total of ~922,000 
particles representing complexes with good 
TIM-1/TIPN-1 occupancy (2.75-A resolution). 
Multibody refinement was used to further im- 
prove the resolution of TIM-1/TIPN-1, defin- 
ing TIM-1/TIPN-1/dsDNA/MCM N-tier as a 
single rigid body (2.64-A resolution). After 
removing density from beyond 25 A around 
TIM-1/TIPN-1 [UCSF Chimera (64), vop zone], 
this map was fitted to the reconstruction of 
the complex containing CMG/TIM-1/TIPN-1/ 
DNSN-1 and used for TIM-1/TIPN-1/dsDNA 
model building. 

The local resolution maps for each of the 
reconstructions described above were gen- 
erated using RELION’s implementation of a 
local resolution calculation program. 


Model building and refinement 


To build the model of the CMG/TIM-1/TIPN-1/ 
DNSN-1 complex bound to a fork DNA sub- 
strate, initial models for individual subunits 
were taken from the AlphaFold Protein Struc- 
ture Database (65, 66) and fitted as rigid bodies 
to our cryo-EM density using UCSF Chimera 
(64) for MCM-2-7 subunits; the amino-terminal 
domains, AAA+ domains, and winged-helix 
(WH) domains were fitted separately. For both 
CDC-45 and the complete tetrameric GINS com- 
plex, models were predicted using AlphaFold- 
Multimer (36, 65, 67). Additionally, the dsDNA 
bound by TIM-1/TIPN-1 and the ssDNA bound 
by the MCM-2-7 AAA+ domains were built into 
density using COOT (68), though the resolution 
of our maps was insufficient to determine the 
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exact position of the final base pair of dsDNA. 
Furthermore, COOT (68) was used to add Zn?* 
ions to the relevant MCM zinc-finger do- 
mains, and AMP-PNP-Mg”* ligands were 
built where density was observed at the in- 
terfaces between MCM-3/MCM-5, MCM-7/ 
MCM-3, MCM-4/MCM-7, and MCM-6/MCM- 
4 (fig. S14A). 

During this process, the MCM-3 and MCM-6 
WH domains were fitted to cryo-EM densities 
(fig. S14B) corresponding to their observed 
positions in a prior structure of the human 
CMG helicase [Protein Data Bank (PDB) ID 
6XTX (69)]; additional density for a third 
WH domain was observed across the CMG 
exit channel beside MCM-5 (fig. S14C); how- 
ever, the local resolution was insufficient to 
confidently assign this to a particular MCM- 
2-7 subunit. A region of a-helical density was 
observed to interact with the PSF-2 and SLD-5 
subunits of the CMG helicase; AlphaFold- 
Multimer (36, 70) confidently predicted the 
association of an amino-terminal a helix of 
DNSN-1 at this location, and the local resolu- 
tion of our cryo-EM reconstruction was suf- 
ficient to confidently build DNSN-1 residues 4 
to 19 into this density (fig. S13, A to D). Sim- 
ilarly, o-helical cryo-EM density with an un- 
ambiguous tryptophan side chain was observed 
interacting with the MCM-3 AAA+ domain 
(fig. S13E); this density is absent from com- 
plexes lacking DNSN-1 (fig. S12). AlphaFold- 
Multimer (36, 70) predicted the association 
of DNSN-1 residues 419 to 431 at this location, 
placing Trp*”’ in the tryptophan cryo-EM den- 
sity (fig. S13, E to H). Because no other trypto- 
phan residues present in any subunit of our 
complex could account for the tryptophan ob- 
served in our cryo-EM density, we were able to 
confidently build DNSN-1 residues 417 to 435. 
Owing to distance constraints, we determined 
that this DNSN-1 o helix belongs to the same 
protomer that forms the interface with GINS 
and the MCM-3 amino-terminal domain through 
its folded domain. 

Finally, additional density was observed to 
bind the TIM-1 amino terminus, displaying 
some o-helical character; AlphaFold-Multimer 
(36, 70) predicted an interaction between 
this region of TIM-1 and part of the amino- 
terminal extension of MCM-2 (fig. S14, E to G), 
predicting placement of the Met’’-Tyr' pair 
where we observe cryo-EM density for two 
bulky side chains. This enabled us to build 
MCM-2 residues 100 to 111; additional density 
was observed nearby between TIM-1 resi- 
dues 14 to 35 and MCM-2 residues 142 to 
152; however, the density was of insufficient 
resolution to confidently assign sequence (fig. 
S14H). 

To improve the fit of the model to our cryo- 
EM density, COOT (68) was used to remove re- 
gions of the proteins for which density was not 
resolved before using an iterative combina- 
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tion of ISOLDE (77) [within UCSF ChimeraX 
(70, 72)| and COOT to adjust and refine mod- 
els to density on a residue-by-residue basis by 
using the best-resolved map for any given 
region of the complex. 

After the completion of model building, the 
model was refined in the cryo-EM reconstruc- 
tion encompassing the complete CMG/TIM-1/ 
TIPN-1/DNSN-1 complex (3.75-A resolution) 
by using Phenix real-space refinement (73), en- 
abling secondary-structure restraints and using 
the input model as a reference to generate re- 
straints with a sigma value of 0.1, and perform- 
ing global minimization with an nonbonded 
weight of 2000 and a weight of 0.5. Model 
validation was performed using the MolProbity 
server (74), Phenix comprehensive validation 
(cryo-EM) (73), and the wwPDB OneDep valid- 
ation server (75). Model-to-map FSCs were 
plotted using Xmipp (76), after we used EMAN’s_ 
pdb2mrc (77) to generate a map from our model 
and removed solvent density from the relevant 
full- and half-maps using multiplication in 
RELION’s relion_image_handler (77). 


In silico modeling of protein:protein 
interactions using AlphaFold-Multimer 


To predict the interaction between TIM-1 and 
the amino terminus of MCM2, AlphaFold- 
Multimer (36, 70) was used to model MCM-2 
residues 82 to 171, and TIM-1 residues 1 to 947 
(except replacing residues 521 to 834 with a 
GGSGGSGGSGGS linker). 

To predict the interaction between the 
C. elegans GINS tetramer and the amino 
terminus of DNSN-1, AlphaFold-Multimer 
(36, 65, 67) was used to model full-length 
PSF-1, PSF-2, PSF-3, and SLD-5, as well as 
DNSN-1 (residues 1 to 90). To predict the in- 
teraction between DONSON and GINS in hu- 
mans, AlphaFold-Multimer (36, 65, 67) was 
used to model PSF1, PSF2, PSF3, SLD5, and 
two copies of DONSON, all full-length. 

To predict the interaction between C. elegans 


DNSN-1 and the MCM-3 AAA+ domain, - 


AlphaFold-Multimer (36, 65, 67) was used to 
model full-length MCM-3 and DNSN-1. To pre- 
dict the interaction between DONSON and 
MCM3 in humans, AlphaFold-Multimer 
(36, 65, 67) was used to model MCM-3 and 
two copies of DNSN-1, all full-length. 

To predict the existence of a C. elegans 
DNSN-1 homodimer, AlphaFold-Multimer 
(36, 65, 67) was used to model two copies of 
full-length DNSN-1. To predict the interaction 
between C. elegans DNSN-1 and MUS-101, 
AlphaFold-Multimer (36, 65, 67) was used to 
model two copies of full-length DNSN-1 plus 
MUS-101 residues 434 to 561 encompassing 
the MUS-101 BRCT3 domain. To predict the 
interaction between human DONSON and 
TOPBPI1, AlphaFold-Multimer (36, 65, 67) 
was used to model two copies of full-length 
DONSON with full-length TOPBP1. 
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Statistics and reproducibility 


GraphPad Prism (GraphPad Software) was used 
to perform statistical analysis. For the EdU cell 
proliferation assay in Fig. 2D and the DNA 
combing assay in Fig. 2F, data were analyzed by 
using a Kruskal-Wallis test followed by Dunn’s 
test. In Fig. 8C, data were analyzed by using a 
paired two-tailed ¢ test. 

In microscopy experiments involving worms 
treated with RNAi, at least five embryos were 
examined for each treatment and found to 
behave similarly to each other, unless stated 
otherwise in the text. 
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ICY MOONS 


Endogenous CQ, ice mixture on the surface of Europa 
and no detection of plume activity 


G. L. Villanueva’, H. B. Hammel’, S. N. Milam’, S. Faggi*?, V. Kofman’, L. Roth’, K. P. Hand®, 
L. Paganini®, J. Stansberry’, J. Spencer®, S. Protopapa®, G. Strazzulla®, G. Cruz-Mermy”®, C. R. Glein™, 


R. Cartwright”, G. Liuzzi? 


Jupiter's moon Europa has a subsurface ocean beneath an icy crust. Conditions within the ocean are 
unknown, and it is unclear whether it is connected to the surface. We observed Europa with the 
James Webb Space Telescope (JWST) to search for active release of material by probing its surface 
and atmosphere. A search for plumes yielded no detection of water, carbon monoxide, methanol, 
ethane, or methane fluorescence emissions. Four spectral features of carbon dioxide (CO2) ice were 
detected; their spectral shapes and distribution across Europa’s surface indicate that the CO2 is mixed 
with other compounds and concentrated in Tara Regio. The CO. absorption is consistent with an 
isotopic ratio of *C/1°C = 83 + 19. We interpret these observations as indicating that carbon is 


sourced from within Europa. 


upiter’s moon Europa is thought to host a 
subsurface ocean beneath an icy surface 
crust, which has a thickness estimated 

to be between 23 and 47 km (1). Space- 

craft measurements have shown that 
Europa has an induced magnetic field, which 
has been interpreted as the result of a deep, 
salty ocean (2, 3). Smaller liquid-water bodies 
might also be present within the ice shell (4). 
Europa’s surface is one of the youngest in the 
Solar System, with the near absence of impact 
craters indicating an age in the range of 40 mil- 
lion to 90 million years (5). The extensive 
resurfacing is probably due to tidal heating 
sustained by orbital resonance—which could 
power cryovolcanism (6) (the eruption of wa- 
ter and volatiles through an ice crust at freezing 
temperatures)—and the upwelling of material 
to form ice domes (7). These processes would 
provide pathways for subsurface materials to 
reach the surface, where they could be observed. 
Surface materials could be either endogenous 
(from within Europa) or exogenous (delivered 
by impacts or from Jupiter’s magnetosphere); 
distinguishing between these possibilities is 
required to infer properties of the subsurface 
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ocean (8). Europa’s surface composition is 
dominated by water ice (9), with a complex 
mixture of other compounds, including salts 
(e.g., NaCl, hydrated sulfates) (10, 11), and carbon- 
and sulfur-bearing molecular species (12-14). 
The diversity of observed species leads to un- 
certainty about the endogenous or exogenous 
nature of material on Europa’s surface. 


Searches for plume activity 


A possible indication of endogenic material on 
Europa would be plumes: ejections of large 
amounts of material through cracks in the ice 
opened by the strong tidal forces. Evidence for 
plumes has been reported from ultraviolet ob- 
servations of auroral emission lines of hydro- 
gen and oxygen in the southern hemisphere, 
which were interpreted as the result of local- 
ized plumes containing up to 1 x 10*” molecules 
of H,O (5). Such plume activity has not been 
confirmed by subsequent observations despite 
several attempts. Magnetic-field and plasma- 
wave observations from a close spacecraft flyby 
of Europa were interpreted as being caused by a 
plume (6). Transit observations of the Europa 
limb have also been interpreted as localized ex- 
cess emission (17), or alternatively as statistical 
noise, not plume activity (78). Another study 
identified one tentative detection (at the 30 
level) of water-vapor plume activity within an 
otherwise quiescent period (19). 

To search for active sources on Europa, we 
probed its atmosphere and surface using JWST 
(20), performing imaging with the Near-Infrared 
Camera (NIRCam) and spectroscopy in the 2.4- 
to 5.2-u.m spectral range (Fig. 1) with the Near- 
Infrared Spectrograph (NIRSpec) at a resolving 
power of ~2700. The observations were carried 
out on 23 November 2022 to sample Europa’s 
leading hemisphere (27). Searching for plume 
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activity was done by probing the narrow m) Chee 
. ae | upde 
ular infrared features fluorescing in sunli..--— 
We targeted the strong fundamentals bands 
of H,O at 2.7 um; CH,, C.H¢, and CH30H in the 
C-H stretch region (near 3.3 um); and CO at 
4.7 um. We extracted an integrated spectrum 
across a 1.3-are sec-diameter region centered 
on Europa (500 km beyond its radius), sam- 
pling the extended region beyond the moon’s 
1-arc sec diameter. We then removed solar 
and ice absorption features and compared the 
resulting residual spectra (fig. S1) with line- 
by-line fluorescence models by performing 
retrievals (27). We assumed an excitation rota- 
tional temperature of 25 K in the models, which 
is similar to the value measured in the plume 
of Enceladus (22). 

None of the targeted molecules were detected 
in the Europa spectrum, and the resulting 30 
upper limits, in units of 10° molecules, are <35 
for H,O, <18 for CHy, <18 for C,H,, <93 for 
CH,OH, and <14 for CO. Assuming an outgassing 
velocity of 583 m s ‘ (79) and isotropic outflow, 
the upper limit of water (<35 x 10°° H,O mol- 
ecules) corresponds to a water-vapor plume 
activity of <1 x 107° molecule s“' (<300 kg s~). 
This upper limit for water is a factor of two ‘ 
lower than the previous tentative detection in 
the leading hemisphere [(70 + 22)x10*° H.O 
molecules (29)], a factor of four lower than that 
inferred from auroral ultraviolet emission lines * 
on the anti-Jovian hemisphere [(130 + 30) x 
10° HO molecules (J5)], and a factor of five 
lower than the median value [180 x 10°° HO 
molecules] reported for plumes at the trailing 
hemisphere (17). The JWST observations of the 
leading hemisphere set a limit on sustained 
water-plume activity on Europa; if any plume 
activity occurs on Europa today, it must be lo- 
calized and weak (J6), infrequent and inactive 
during our observations, or devoid of the vol- - 
atile gases that we searched for. . 


CO2 detection and isotope ratio 


An alternative way to probe for endogenic sources 
on Europa is to search for recently deposited 
material on its surface. The NIRCam images 
(Fig. 2A), obtained by combining the observa- 
tions with filters F140M [1.331 to 1.479 um] 
and F212N [2.109 to 2.134 um], show enhanced 
brightness in Tara Regio (10°S, 75°W), which 
is an area of chaos terrain, and also on the anti- 
Jovian side of Europa (180°W). The chaos ter- 
rain is an area of irregular groups of large blocks, 
which are thought to be related to an active 
geological process. Using the contemporane- 
ously collected NIRSpec spectra of the leading 
hemisphere, we searched for evidence of CO, 
CH,, or CH30H ices, but did not detect them. 
It has been suggested that CO2 ice on Europa 
is concentrated on the anti-Jovian and trailing 
sides of its surface (12); however, the absorp- 
tion bands were only marginally resolved in 
earlier data (23). Many nonwater ice bands 
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as (B) except for the double-peaked CO. feature at 4.27 um. The blue line is a model of a CO2:H20:CH30H [1:0.8:0.9] mixture at 114 K. The shape of the observed 
spectrum is fitted with a combination of the blue and purple models. The peak position and width of the feature can alternatively be reproduced with a model (dashed red 
line) of carbonic acid synthesized in a CO2:H20 ice mixture (ratio 5:1) exposed to ionizing radiation. 


have previously been mapped at hemisphere 
scales, including H2O, at 3.5 um (24), COs at 
4.3 um (12), and SO, at 4.0 um (72, 25). If CO. is 
associated with endogenic landforms, then it 
would provide information on Europa’s inte- 
rior, such as the carbon content of the ocean. 
Theoretical models have predicted that the 
ocean contains dissolved CO, and other car- 
bonate species (26), yet observations in the 
near infrared (1 to 2.5 um) did not detect CO, 
(27) on Europa, so its presence and distribu- 
tion remain unclear. 

In the JWST data, we detected multiple features 
due to CO, ice on Europa: a narrow absorption 
band at 2.7 um (Fig. 1B), a double-peaked ab- 
sorption band at 4.25 and 4.27 um (Fig. 1C), 
and an absorption due to the rarer isotopologue 
CO, at 4.38 um (fig. S2C). CO, has previously 
been observed on two of Saturn’s moons, Phoebe 
and Iapetus (28), but not on Europa. From the 
ratio of the °CO, and “CO, features, we estimate 
the carbon isotopic ratio ’C/C = 83 + 19 (10) 
(21). This value is consistent with the Earth in- 
organic standard [Vienna Peedee Belemnite 
(VPDB)], which has °C/"C = 89 (29). It is also 
consistent with measured values for Iapetus 
[?c/C = 83 + 8 (28)] and with the range of 
®c/"8C ratios (between 83 and 85) measured 
from carbonate minerals in Ivuna-type carbo- 
naceous chondrite meteorites and samples of 
the asteroid Ryugu (30). These values could re- 
flect primordial (present in the protosolar neb- 
ula) COs, which could have been incorporated 
into Europa if it assembled from materials 
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that formed at temperatures below ~80 K (37). 
Alternatively, the carbon in Europa’s CO, could 
have been inherited from accreted primitive 
organic matter in the Solar System, which has 
®c/"8C = 90 + 1 (32). The ratio of ’C to “C is 
used as a biosignature on Earth (33), where lo- 
calized carbon sources and reservoirs can have 
higher C/"°C ratios (up to 104) as a result of 
biogenic processes (29). For C isotopes to serve 
as a biosignature on Europa, the isotopic frac- 
tionation between reduced carbon and CO, 
would need to be determined (34), which we 
cannot measure with these data, and there- 
fore we cannot distinguish between abiotic or 
biogenic sources. 


Nature and distribution of the CO, ice 


The observed 4.25-um absorption band due to 
CO, has a double-peaked structure, which 
differs from the single-peaked crystalline CO, 
ice (Fig. 1C). The synthetic spectrum of crys- 
talline CO, ice in Fig. 1C was computed with 
the surface model of the Planetary Spectrum 
Generator (PSG) (21, 35). The best match we 
found to this doubly peaked shape (Fig. 1C) 
was to a laboratory spectrum of a mixture of 
CO.,, H,O, and CH3;0H in the ratio 1:0.8:0.9, 
respectively, measured at a temperature of 
114 K (36). The temperature of this laboratory 
spectrum is within the range previously mea- 
sured for different hemispheres of Europa (90 
to 130 K) (37). This could indicate that CO, is 
stored in a water- and organic-rich matrix on 
Europa, yet we did not detect any bands in our 


22 September 2023 


spectra that were due to CH3OH ice or other 
organic molecules. We regard methanol as a 
proxy for the effect of any organics on the 
band position of COs, and several other effects 
could also produce shifts in the CO, funda- 
mental band (27, 38). A blue-shifted CO. peak 
has previously been observed on Ganymede 
and Callisto (39, 40) but did not show the same 
double-peak signature as we observe on Europa, 
perhaps because of differing spectral resolutions. 
The closest match to the CO, band detected on 
Callisto and Ganymede was a laboratory spec- 
trum of carbonic acid (H2CO3) synthesized in a 
CO :H,O ice mixture (in the ratio 5:1), then ex- 


posed to ionizing radiation in the form of 5-keV_ - 


electrons (42). Similar laboratory irradiation 
experiments have been reported for Europa- 
like conditions (42). Figure 1C shows a syn- 
thetic spectrum based on the carbonic acid 
experiment (47), which reproduces the width 
and location of the band but not its double peak. 

To test a possible matrix for the observed CO., 
we measured spectra of oceanic salt evaporite 
with a thin CO, ice film deposited onto the 
salts at different temperatures while being 
irradiated (27). In the experiments, the feature 
at 4.25 um appeared after irradiation of the 
salts, whereas the feature at 4.27 um was present 
in freshly deposited CO, (fig. S2C). We therefore 
interpret the 4.25-um band as likely indicating 
CO, that was either adsorbed onto salts or cap- 
tured within them. 

We searched for heterogeneities in the CO, 
ice abundance and its structure by mapping 
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Fig. 2. Distribution of COz on Europa. (A) A false-color image of Europa as it appeared during the JWST 
observations (21). The image is oversampled at 0.031 arc sec per pixel; the diffraction-limited resolution is 
~0.08 arc sec at these wavelengths. (B) Distribution of the band intensity of the COz 2.7-um feature, 

determined by fitting a model of CO> crystalline ice to the spectrum at each location. The white circle 

indicates the size of Europa in (A). (© and D) The 4.25- and 4.27-um double-peaked feature was modeled as 
a combination of two components: COz crystalline ice [band intensity shown in (C)] and CO2 noncrystalline 
ice [band intensity shown in (D)]. (B), (C), and (D) share the same color bar but with different maximum/ 


middle values of 0.70/0.35, 4.20/2.10, and 7.10/3.55 


the strengths of the three “CO. peaks across 
the observed hemisphere of Europa (Fig. 2); the 
*®CO, feature is too weak for mapping. For the 
mapping process and at each spatial point, 
we fitted a model of CO, crystalline ice for the 
2.7-um feature, whereas we modeled the 4.25- 
and 4.27-um double-peaked feature as a com- 
bination of two components: CO, crystalline ice 
(using the model described above) and a CO, 
excess. The CO, excess model was constructed 
by subtracting the synthetic spectra of the mix- 
ture of COs, H2O, and organic molecules from 
the crystalline CO, spectrum (Fig. 1C). 

All three bands are strongest in the chaos 
terrain of Tara Regio, and the 2.7- and 4.27-um 
CO, bands have similar distributions (Fig. 2). 
The 4.25-um band has a larger dynamic range, 
with almost no detection in the northern re- 
gions and a lower abundance between Tara 
Regio and the anti-Jovian regions (Fig. 2D). 
The most abundant surface CO, appears to be 
in Tara Regio, potentially indicating that this 
geologically distinct region is associated with 
an endogenous source of COs. The distribution 
of the 4.25-um CO, band is similar to that 
noted in previous observations of irradiated 
NaCl on Europa (17), whereas the 2.7-um and 
4.27-um bands are distributed more broadly 
across Europa’s surface. This is consistent with 
our interpretation (see above) that the 4.25-1m 
feature is due to CO, mixed with salts or pro- 
duced through irradiation of carbonate salts. 


An endogenous source of CO2 


CO, has been observed on a wide variety of 
Solar System objects and can have either na- 
tive (endogenous) or nonnative (exogenous) 
origins. The localized CO, that we observe on 
Europa could be related to a disrupted surface, 
with a difference in the surface grain sizes af- 
fecting the strength of the CO, absorption across 
the surface (43). Exogenous explanations for 
the observed CO, on Europa are possible, but 
an exogenous source would likely produce a 
more global distribution, not the observed lo- 
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nm, respectively. 


cal concentration that is associated with salts 
(which are presumably endogenous). CO. ice 
is also localized on Enceladus, where it is known 
to be endogenous (44). Exogenous interplan- 
etary dust grains might deliver carbonaceous 
material to Europa’s icy surface, which could 
then yield CO, through radiolysis (42), but no 
silicate features indicative of such exogenous 
material have been reported for Europa (25). 
Given the CO, association with NaCl, and our 
laboratory results (27), we conclude that the most 
likely origin of the observed CO2 is endogenous, 
at least within Tara Regio. 

We consider several possible endogenous 
sources of COs. One possibility is that aque- 
ous solutions rich in CO, are present in the 
subsurface. Such solutions could be present if 
a long-lived reservoir, such as Europa’s ocean, 
has a low-enough pH (26), or if fluids migrating 
through Europa’s ice shell incorporate CO, 
derived from destabilized dry ice or CO, clath- 
rate hydrate (45). 

A second potential source of CO, could be 
carbonate-bearing fluids (e.g., NaHCO; or NagCO; 
dissolved in water). Enceladus has a carbonate- 
rich ocean that degases CO, (46); some of that 
degassed CO, freezes out on the surface (47). 
An analogous process could occur on Europa. 
Alternatively, endogenous carbonates could 
react with acid compounds (e.g., H2SO,) at or 
near the surface to produce COs, or extruded 
brines [if they contain (bi)carbonate salts] could 
produce CO, during radiation processing (48). 

A third possibility is that the carbon in the 
CO, might have been from organic compounds 
that were originally dissolved or suspended in 
a subsurface liquid-water reservoir, which were 
later converted to COz. COz might be generated 
by irradiation on the surface, when material 
sourced from Europa’s interior, rich in carbo- 
nate salts and/or organics mixed with H.O, is 
bombarded by charged particles trapped in 
Jupiter’s magnetosphere (49). A similar process 
has been proposed to form hydrogen peroxide 
(H,0.) from H,O ice; HO, has previously been 
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observed to be enriched at low latitudes across 
Europa’s leading and anti-Jovian quadrants, 
including within the boundaries of Tara Regio 
(50). Because the surface environment of Europa 
is strongly oxidized, CO. would be produced 
by radiation-driven oxidation of reduced car- 
bon species (organics) on Europa’s surface; the 
lack of detectable CO could be an indication 
of that process (49). 

Regardless of the specific source species of 
COs, we regard the presence of CO, in a region 
with previous indications of subsurface liq- 
uid water as evidence of carbon availability in 
Europa’s interior. 
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ICY MOONS CO, on Callisto is associated with impact| 


The distribution of CO, on Europa indicates an 


internal source of carbon 


Samantha K. Trumbo™ and Michael E. Brown? 


Jupiter's moon Europa has a subsurface ocean, the chemistry of which is largely unknown. Carbon 
dioxide (CO2) has previously been detected on the surface of Europa, but it was not possible to 
determine whether it originated from subsurface ocean chemistry, was delivered by impacts, or was 
produced on the surface by radiation processing of impact-delivered material. We mapped the 
distribution of CO. on Europa using observations obtained with the James Webb Space Telescope 
(JWST). We found a concentration of CO2 within Tara Regio, a recently resurfaced terrain. This indicates 
that the COz is derived from an internal carbon source. We propose that the CO. formed in the 
internal ocean, although we cannot rule out formation on the surface through radiolytic conversion of 


ocean-derived organics or carbonates. 


eneath a crust of water ice, Jupiter’s moon 

Europa has an internal ocean of salty 

liquid water above a rocky seafloor (J, 2), 

a potentially habitable environment. As- 

sessing the ocean’s habitability depends 
on its chemistry, including the abundances of 
biologically essential elements, bulk oxidation 
state, and available chemical energy sources 
(3). The ocean’s carbon content is poorly con- 
strained. Previous work has identified carbon 
dioxide (CO,) on Europa’s geologically young 
surface through its infrared vz; asymmetric 
stretch fundamental band (4-6); however, there 
was either insufficient spatial resolution to 
map its distribution or too many limitations 
caused by noise and artifacts to determine the 
source of CO, (5). Attempts to map the CO, on 
Europa indicated a possible association with 
dark material that is potentially endogenic (de- 
rived from internal processes), but the results 
were ambiguous and difficult to interpret (4). 
It has therefore been impossible to distinguish 
between several possible origins of the COz: 
geologically associated CO. sourced from the 
ocean; geologically associated CO, produced 
radiolytically by surface irradiation with charged 
particles of native organics or carbonates; or 
exogenic (outside Europa) sources, such as de- 
livery by impacts of CO,-rich bodies or radiolytic 
conversion from meteorite-delivered carbona- 
ceous material (7, 8). 


CO. distribution on Europa 


We analyzed observations of CO. on Europa 
obtained with the JWST Near-Infrared Spec- 
trograph (NIRSpec) integral field unit (FU) on 
23 November 2022. The spectra have resolv- 
ing power, R, of ~2,700, sufficient to resolve a 
double-peaked structure within Europa’s CO, 
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v3 band, which we find has minima at 4.249 + 
0.001 um and 4.268 + 0.002 um (Fig. 1). We 
measured the integrated band area (equivalent 
width) across the entire absorption feature in 
each spatial pixel, then mapped the resulting 
band strengths across the surface (9). 

The strongest overall CO. absorptions are 
located in Tara Regio (~10°S, 75°W) (Fig. 2A), 
a roughly 1,800-km-diameter area of geolog- 
ically disrupted resurfaced material (known 
as chaos terrain) that is among the youngest 
on Europa’s surface (JO, 11). The formation of 
chaos terrain is not well understood; however, 
proposed explanations all involve large-scale 
disruption of the surface ice through interac- 
tions with endogenic material from below, 
such as upwelling buoyant material, subsurface 
brines, or lens-shaped formations of liquid 
meltwater (12, 13). Tara Regio has previously 
been inferred to contain salty endogenic ma- 
terial (14, 15). Sodium chloride (NaCl) has been 
detected in the region and interpreted as ocean- 
derived (16, 17), and it also contributes to the 
discoloration of Tara Regio relative to the 
surrounding regions (J6, 8). 

CO, is also enhanced within portions of Powys 
Regio (~0°S, 150°W), another large-scale chaos 
terrain, although it is less well resolved by the 
JWST observations (Fig. 2A). This region also 
exhibits generally weaker signatures of en- 
dogenic NaCl than does Tara Regio (15-17). 


Potential exogenic sources 


The association between the CO, band and 
Europa’s geologically young, resurfaced chaos 
terrain indicates a relationship between the 
surface CO, and Europa’s internal chemistry. 
Nevertheless, we considered the possibility of 
exogenic origins. CO, has also been detected on 
two other moons of Jupiter—Ganymede and 
Callisto (6, 19-22)—and throughout Saturn’s 
satellite system (23-26). Unlike that found on 
Europa, the CO. on Ganymede is preferentially 
associated with dark non-ice material in the 
oldest, most heavily cratered terrain (20). The 
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ters and has a hemispherical distribution la 
sistent with exogenic processes thought to be 
associated with Jupiter’s corotating magnetic 
field (19, 21). Previously suggested explanations 
for the CO, on Saturn’s icy satellites (23-26) 
include the delivery of CO,-bearing impactors 
and radiolytic production from externally de- 
rived carbonaceous materials implanted into 
the surface ice (23). 

None of these scenarios would produce the 
observed relationship between Europa’s CO, 
and its geologically disrupted large-scale chaos 
terrain. If the CO. were externally delivered, we 
would not expect it to be supplied specifically 
to terrain formed through endogenic processes. 
If instead it were produced radiolytically from 
carbon-bearing meteoritic materials, we would 
expect a distribution reflecting the external 
implantation, radiation intensity, or both. The 
leading hemisphere of Europa, where Tara 
Regio and Powys Regio are located, is thought 
to receive more meteorite impacts than the 
trailing hemisphere (5, 27), but there is no rea- 
son to expect that input to follow the distribu- 
tion of chaos terrain. The low latitudes of the 
leading hemisphere receive higher fluxes of ‘ 
>20-mega-electron volt (MeV) energetic elec- 
trons in a longitudinally and latitudinally sym- 
metric lenslike pattern centered on the leading 
point (O°N, 90°W) (28). This pattern does not 
resemble the highly asymmetric, geologically 
correlated distribution that we observe. 

To be stable under Europa’s surface temper- 
atures and pressures, CO. must be trapped 
within a host material (5). We therefore con- 
sidered the possibility that CO. is produced 
uniformly across the surface from exogenically 
implanted material, then somehow trapped 
more efficiently within chaos terrain. However, 
there is no evidence for a connection between 
chaos terrain and minerals known to trap CO, 
[including amorphous ice and phyllosilicates 
(29, 30)]. The NaCl that is known to be en- 


riched in Tara Regio does not contain the nec- - 


essary trapping sites in its mineral structure. 

We considered whether optical effects, asso- 
ciated with grain-size differences and the dark 
salty material in Tara Regio, could enhance 
the observed CO, band depths in this region. 
Larger ice-grain sizes cause photons to en- 
counter more dark particles of non-ice mate- 
rial, which would enhance band strengths if 
that material were the host for CO. However, 
Tara Regio has been suggested to contain smaller 
ice grains than those found in its surround- 
ings (32), which precludes this effect, and an 
equator-to-pole increase in grain size has been 
generally inferred across Europa’s surface (5). 

After considering each of these scenarios, 
we reject the possibility of an exogenic origin 
of the CO,. Instead, we infer an endogenic 
origin linked to the geologically young chaos 
terrain. 
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Fig. 1. Spectra of the CO2 v3 band on Europa. 
Continuum-normalized spectra, smoothed with a 
five-point moving average, are shown by the solid 
black curves. The dashed black lines indicate the 
continuum level. The two band minima are marked 
by vertical blue and red dashed lines at 4.25 wm and 
4.269 um, respectively. (A) Spectrum extracted 
from a spatial pixel centered at 46°N, 87°W, in which 
the 4.27-um peak dominates within the v3 band. 
(B) Spectrum from a pixel centered at 15°N, 112°W, 
in which the 4.25-um and 4.27-4um peaks have similar 
depths. (C) Spectrum from a pixel centered at 8°S, 
84°W within Tara Regio, where the 4.25-um peak is 
stronger than the 4.27-11m peak. The locations used 
for each panel are indicated in Fig. 2B. 


Potential endogenic sources 

In principle, CO, might not be the original form 
of carbon delivered to the surface but could 
instead be produced within Tara Regio from 
the irradiation of emplaced organics (32, 33) 
or carbonate minerals by the =>20-MeV ener- 
getic electrons that impact the leading hemi- 
sphere (28). To investigate these possibilities, 
we searched the JWST spectra for spectral fea- 
tures around the C-H stretch bands of possi- 
ble organics (~3.3 to 3.4 um) or the ~3.4- and 
~3.9-um bands of carbonates. These signatures 
of organics have been detected on other icy 
satellites, including Enceladus, Iapetus, and 
Hyperion (34-36) (Fig. 3A), and carbonates 
have been observed in the bright spots of the 
dwarf planet Ceres (37). We do not detect any 
such bands in the Europa observations. How- 
ever, the 3.2- to 3.6-um wavelength region in 
the Europa spectra is complicated by strong 
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Fig. 2. Maps of the COz2 v3 band area and the ratio between its two peaks. In both panels, the disk 

of Europa is indicated by a black circle centered at 2.7°N, 93°W. Gray dashed lines indicate the 45°W, 
90°W, and 135°W meridians and the 60°S, 30°S, 0°N, 30°N, and 60°N parallels. The boundaries 

(11) of large-scale chaos regions Tara Regio (~10°S, 75°W) and Powys Regio (~0°S, 150°W) are outlined 
in black within the disk. (A) Map of the band area (indicated by the color bar) of the entire v3 CO2 
band. The strongest absorption occurs in Tara Regio to the right of the 90°W meridian. COz is also 
concentrated in parts of the chaos region Powys Regio on the left portion of the disk. (B) Map of 

the ratio (color bar) between the 4.25-um and 4.27-um peaks within the COz band. The spatial pixels 
outlined in white mark the locations of the spectra shown in Fig. 1. The 4.25-um peak is the stronger 

of the two within the chaos terrain and at low latitudes, whereas the 4.27-um peak is stronger in the colder, 


more ice-rich northern latitudes. 


bands of water ice, a water-ice reflectance peak 
at ~3.1 um, and a previously identified band 
of hydrogen peroxide (H2O,) at 3.5 wm (38). 
If carbonate salts are heavily hydrated, the 
characteristic bands can be suppressed and 
hidden by water signatures (39). 

An endogenic reservoir of organic- or carbonate- 
bearing precursor material would produce stronger 
organic or carbonate bands in Tara Regio than 
in nonchaos terrains. We therefore divided the 
average spectra of Tara Regio by the average of 
another region to its north to enhance any or- 
ganic or carbonate features (Fig. 3B). The ratio 
is dominated by broad differences in the back- 
ground continuum and near the H,O 3.1-um 
reflectance peak, as well as by the excess H2O2 
already known to be present within Tara Regio 
(40). There are no features attributable to or- 
ganic material. We therefore exclude the pres- 
ence of an Enceladus-like organics band at 
3.44 um (Fig. 3), with an upper limit of <20% 
of the band strength observed in Enceladus’ 
tiger stripes region (36). We also exclude the 
3.55 um band to a similar level; on Enceladus, 
this feature has been interpreted as resulting 
from organic material (36), possibly methanol 
(CH3;0H) (4D, or alternatively interpreted as 
HO, (42). On Iapetus, the aromatic and ali- 
phatic bands are broad, but the 3.35-um reflec- 
tance peak between them (34) would appear 
in the ratio spectrum (Fig. 3B) if this band were 
present. However, we would not be able to dis- 
cern similarly broad organics signatures against 
the stronger HO and H,O, absorptions if such 
a reflectance peak were absent (as would occur 
if the organics were dominated by either aro- 
matic or aliphatic compounds). 
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The ratio spectrum (Fig. 3B) shows a <1% 
deviation from the continuum near 4.0 um, 
which is coincident with the strongest carbo- 
nate absorption band for Ceres’ Occator crater. 
This is consistent with a minor contribution 
of carbonate minerals in Tara Regio. However, 
we consider this potential feature to be too 
shallow and indistinct in shape to constitute 
a detection of carbonate, particularly because 
it falls close to the edge of a gap in the NIRSpec 
wavelength coverage. We therefore put an up- 
per limit of <1% on any ~3.9- to 4.0-um car- 
bonate bands, which is equal to the observed 
depth of this feature. This is equivalent to a 
band depth of <2.5% of the band observed at 
Occator crater on Ceres (37). No organics or car- 
bonates have been identified in higher-spatial 
resolution observations of the same chaos ter- 
rain from ground-based observatories (which 
did not cover the CO, band) (15, 40). Labora- 
tory experiments have investigated the elec- 
tron irradiation of hydrocarbon and water-ice 
mixtures at Europa-relevant temperatures, 
finding production of refractory long-chain 
aliphatic organic material and CO, (33). No 
equivalent laboratory experiments have been 
reported for irradiation of carbonates. The 
JWST data provide no evidence for such addi- 
tional carbon-bearing material on Europa or 
preferentially within Tara Regio. We there- 
fore prefer the interpretation that the CO. we 
observed on Europa was delivered from an 
endogenous source already in the form of COs. 


Double-peaked CO2 band profiles 


The JWST data resolve two discrete peaks with- 
in the vg band of Europa’s CO, (Fig. 1), which 
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Fig. 3. Comparison between spectra of Europa and of other Solar System bodies. (A) Spectra (average of 
4 spatial pixels) of Tara Regio (green) and a location north of Tara Regio (blue). Labels indicate the v3 band 

of COs, the 3.5-um band of hydrogen peroxide (H202), and the 3.1-um Fresnel reflectance peak of water ice. Also 
shown for comparison are a spectrum of Occator crater on Ceres (dotted black line), which has absorption due to 
carbonate minerals (37); a spectral ratio of the tiger stripes region on Enceladus relative to its nearby regions 
(dashed black line) (36), which exhibits absorptions interpreted as organics; and a continuum-removed spectrum 
of the trailing hemisphere of lapetus (dash-dotted black line), which shows bands at ~3.3 wm and ~3.45 um 
due to aromatic and aliphatic organics, respectively (34). All spectra have been normalized at 3.6 um, and the 
Ceres, Enceladus, and lapetus spectra are offset for display by 0.55, 0.5, and 0.78 units, respectively. (B) The ratio 
(solid black line) between the spectra of Tara Regio and the region to its north from (A). The same spectra 

of Enceladus, lapetus, and Ceres as shown in (A) are included for comparison, with additional bands labeled. We 
attribute the 3.5-um band in the ratio spectrum to excess H2Q2, which has previously been seen to be more 
abundant in Tara Regio (40). No other absorption bands are evident in the ratio spectrum, except for a <1% 
deviation from the continuum near 4.0 um, which is too shallow to constitute a detection of carbonates and 
is located close to the gap in NIRSpec wavelength coverage. (C) Optical imaging mosaic of Europa projected 
to the JWST observing geometry (51) overlain with the boundaries of the chaos terrain [black outlines 
(11)]. Also shown are the locations of the NIRSpec pixels that we used to produce the average spectra shown 
in (A), for Tara Regio (green squares) and the terrain to its north (blue squares). Dashed white lines mark 
the 45°W, 90°W, and 135°W meridians and the 60°S, 30°S, O°N, 30°N, and 60°N parallels. 


constrains its physical state on the surface. 
The 4.25-um peak implies that the associated 
CO, exists in a trapped form, causing the fun- 
damental vibration to shift to higher frequencies 
than the nominal position of ~4.27 um expected 
for pure CO, ice. This phenomenon has previ- 
ously been invoked to explain the stability and 
band positions (4.257 to 4.258 um) of the CO, 
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bands observed on Ganymede and Callisto, 
although there is no consensus on the trapping 
mechanisms and host materials (19-21, 30). 
The band positions on Europa, Ganymede, and 
Callisto are all shifted in the opposite direc- 
tion from that expected for CO, trapped in 
amorphous water ice (29) and do not match 
the shift expected for CO, clathrate either, 
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although CO, in clathrates has two discrete 
band minima (43). At the cryogenic tempera- 
tures and near-vacuum pressures on the surfaces 
of these moons, CO, adsorbed onto phyllosili- 
cates produces v3 peaks that, in some cases, 
match those observed on Ganymede and Callisto 
(30), although phyllosilicate minerals have 
not been identified on their surfaces and we 
consider them implausible for Europa’s salty 
resurfaced terrain. 

We have not identified any laboratory data 
that both match the wavelength of Europa’s 
4.25-um CO, peak and use a host material that 
we consider plausible on the basis of Europa’s 
known surface composition. The substantial 
difference between the CO, bands observed on 
Europa compared with those on Ganymede 
and Callisto indicates a different trapping mech- 
anism, host material, or both. This is consistent 
with the apparently different origins of the 
moons’ COs, which has been associated with 
older, dark terrain on Ganymede and with craters 
and magnetospheric processes on Callisto (19-21). 
The vz CO. bands observed on several moons of 
Saturn are also shifted to higher frequencies but 
vary in wavelength by up to 0.014 um between 
the different bodies, with Dione and Hyperion 
providing the closest matches [4.253 um and 
4.252 um, respectively (23)] to Europa’s band 
position, although still not an exact match. The 
positions of these bands on Dione and Hyperion 
indicate trapped CO, associated with other 
materials or ices, but the specific hosts are un- 
known (23). 

The CO, peak at ~4.27 um observed on Europa 
is consistent with the position of 4.267 to 4.268 um 
for pure CO, ice (43, 44), which is too volatile 
to be stable at Europa’s surface conditions (5). 
Either there is another (unknown) trapping 
mechanism that produces a minimal band shift, 
or a single trapping mechanism that causes 
band splitting, perhaps due to distinct trap- 
ping sites within a single host material [as in CO. 
clathrate, which has a double peak (43)]. The 


band depths of both the 4.25-1m and 4.27-1m_ - 


peaks are strongest in Tara Regio, but their ratio 
varies across the surface. Within Tara Regio and 
across the equatorial latitudes, the 4.25-1m peak 
dominates, whereas the 4.27-.m peak is stronger 
across the northern latitudes (Fig. 2B), which are 
colder and more enriched in water ice (31, 45). 
We suggest that low temperatures, ice abun- 
dance, or both could be related to the longer- 
wavelength peak. 


Implications for Europa’s atmosphere and ocean 


We interpret the observed association of CO, 
with large-scale chaos terrain as indicating 
emplacement of carbon from the interior. The 
carbon is probably supplied as CO. formerly 
dissolved in the subsurface ocean, but it could 
also be in the form of other carbon-bearing 
precursors. Emplacement could have occurred 
during the formation of such disrupted regions, 
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with potential mechanisms including subsur- 
face upwelling, melt-through, and ice collapse 
above shallow subsurface liquid (72, 13). At 
the surface, the volatile CO. must be trapped 
in one or more other materials, but we expect 
ongoing sputtering of Europa’s surface (46) to 
release some CO, into Europa’s tenuous atmo- 
sphere. Photoionization and interactions with 
Jupiter’s magnetosphere would then remove 
CO, from the system, as has been predicted 
for Callisto (where the expected timescale is 
~4 years) (47). CO that enters the atmosphere— 
but is not immediately removed—is unlikely 
to exit the atmosphere but could instead hop 
across the surface [as has been simulated for 
Iapetus (48)] until it is either retrapped or lost. 
A continual or recent supply of CO, is there- 
fore required to explain the observed CO, on 
the surface, which is consistent with the max- 
imum CO, band strengths being associated 
with the youngest terrain. 

Our interpretation implies that carbon, a 
biologically essential element, is present in 
Europa’s subsurface ocean and has reached 
the surface ice on a geologically recent time- 
scale. If this carbon was delivered as COs, and 
if that CO, is representative of the carbon re- 
dox state in the ocean, then a highly reduced 
ocean chemistry is unlikely (49). An oxidized 
ocean would instead be consistent with the 
proposed downward delivery (through the ice 
crust) of radiolytically produced surface oxi- 
dants (e.g., O2 and H,O.) on geologic time- 
scales (3). An ocean rich in CO, would also be 
consistent with slightly acidic conditions and 
a metamorphic origin of the ocean (50). 
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CARTELS 


Reducing cartel recruitment is the only way to lower 


violence in Mexico 


Rafael Prieto-Curiel’*, Gian Maria Campedelli7+, Alejandro Hope*+ 


Mexican cartels lose many members as a result of conflict with other cartels and incarcerations. Yet, 
despite their losses, cartels manage to increase violence for years. We address this puzzle by leveraging 
data on homicides, missing persons, and incarcerations in Mexico for the past decade along with 
information on cartel interactions. We model recruitment, state incapacitation, conflict, and saturation 
as sources of cartel size variation. Results show that by 2022, cartels counted 160,000 to 185,000 units, 
becoming one of the country’s top employers. Recruiting between 350 and 370 people per week is 
essential to avoid their collapse because of aggregate losses. Furthermore, we show that increasing 
incapacitation would increase both homicides and cartel members. Conversely, reducing recruitment 
could substantially curtail violence and lower cartel size. 


atin America is home to only 8% of the 

world’s population, but roughly one in 

three intentional homicides worldwide 

occur in the region (7). Mexico accounts 

for a relevant share of such homicides, 
primarily because of the long-standing pres- 
ence of cartels across many areas of the coun- 
try. In 2021, Mexico reported 34,000 victims of 
intentional homicide—nearly 27 victims per 
100,000 inhabitants—and was ranked among 
the least peaceful countries in Latin America 
(2). Between 2007 and 2021, the number of 
homicides in the country increased by more 
than 300% (3), with institutional sources quan- 
tifying that between 2006 and 2018, about 
125,000 to 150,000 homicides were related to 
organized crime in Mexico (4). 

The effects of cartels on Mexico’s society are 
far-reaching. These entail their presence across 
a wide array of illegal activities beyond drug 
trafficking (5, 6), the deterioration of human 
rights (7), and the weakening of institutional 
stability through extensive acts of violence 
(8, 9). Furthermore, some cartels have acquired 
a transnational dimension, expanding their 
business to the United States and beyond (J0). 

In this context, although cartels lose dozens 
of members daily as a result of killings and 
state incapacitation through incarcerations, 
the violence over the years has not decreased. 
We tackle this puzzle by studying cartels’ evo- 
lution, deriving their sizes, and considering 
four fundamental sources of size variation: 
recruitment, incapacitation, conflict, and sat- 
uration. These sources capture the different 
exogenous and endogenous dynamics explain- 
ing why and to what extent cartels grow or 
shrink. Recruitment refers to the process of 
attracting a new workforce that stably carries 
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out tasks (both strictly criminal and not) for 
cartels (17). Incapacitation measures the abil- 
ity of the state to counter cartels through in- 
carceration (72). Considering all incarcerations 
allows us to avoid the bias of only focusing 
on incarcerations for homicides, which are 
only a fraction of the offenses committed by 
cartel members. Conflict describes the extent 
to which cartels clash and fight with each 
other (13, 14). Finally, saturation character- 
izes internal instability and dropouts, which 
lead to organizational fragmentation (4, 15). 

Despite Mexican cartels’ economic, social, 
and political importance, we lack essential in- 
formation to better understand how they func- 
tion. In fact, we primarily lack estimates of the 
size of these criminal entities. We also lack 
systematic estimates of cartel-related killings 
and kidnappings and figures related to recruit- 
ment trends, which makes it extremely difficult 
to deepen our knowledge about their presence, 
resources, and goals. The secretive nature of 
cartels’ actions, as well as the insufficient amount 
of information accessible to map them, makes 
them conceptually similar to black boxes, from 
which we can only extrapolate imperfect proxies 
of activity using, for instance, the daily num- 
ber of homicides or the number of drug-related 


6 


incarcerations that occurred in the country| Chee 

ee ‘ : updz 
Although homicide and incarceration tre. 
are imperfect because they do not discriminate 
between offenses that occurred specifically in 
the context of organized crime, they can be 
used to estimate cartels’ violence capacity and 
the state’s incapacitation against them. In this 
work, we build on this intuition and exploit 
data on murders, missing persons, and incar- 
cerations in Mexico between 2012 and 2022 to 
derive cartel size. We propose a mathematical 
system to represent their behavior over 10 years 
and seek to shed light on the mechanisms with- 
in the so-called black box of the cartels. 

This work has two main goals. First, we aim 
to obtain plausible estimates of the cartels’ 
population, including their number of mem- 
bers and recruitment capacity. Second, we seek 
to simulate different policy scenarios (i.e., in- 
creased state incapacitation and recruitment 
prevention) to disentangle the effects of vary- 
ing strategies to curb cartels’ power and, in 
turn, violence in the country. Our conceptual 
framework is built on the evidence that, de- 
spite the high number of murders and incar- 
cerations in the past 10 years, cartels have 
maintained and even increased their power, ‘ 
control, and resources, introducing even more 
violence in the country. To construct our model, 
we gauge data on 150 cartels active in Mexico in 
2020, including information on their alliances ‘ 
and rivalries and data corresponding to homi- 
cides, missing persons, and incarcerations. 


Methods 


We ask two research questions (RQs). RQ1: 
What is the size of Mexico’s cartel population, 
and what is their capacity to recruit members? 
RQ2: To control cartel violence, is a preventive 
policy strategy (focused on reducing cartel re- 
cruitment efforts) more effective than a reactive - 
policy strategy (focused on increasing police ‘ 
efforts to incarcerate cartel members)? 

We consider four mechanisms that explain 
why cartel size varies: recruitment, incapac- - 
itation, saturation, and conflict (Fig. 1). We 
model the conflict between cartels with a 


ha @ ae 


conflict 


2017 


2022) 


recruitment 


incapacitation 


Fig. 1. Model diagram representing 
the four reasons why a cartel 
changes in size. Most cartel-related 
activities remain undercover, but we 
observe some of their by-products in 
casualties and incapacitations. 
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weighted network, where a node represents 
each cartel, and an edge represents a conflict 
in some state in Mexico. Similarly, we con- 
struct a weighted network of alliances between 
cartels across different states. The model is a 
system of coupled differential equations, one 
for each cartel. Although we cannot observe 
most aspects of cartels (such as their recruitment 
and internal conflicts), we use the observed 
number of casualties and incarcerations to 
estimate the model parameters and infer the 
size of each cartel. We then use those esti- 
mates to forecast different scenarios for the 
next 5 years in Mexico. See the supplemen- 
tary materials, sections A to F, for details 
on methodology. 


Results 
RQ1: Estimating cartels’ populations 


Most cartel-related activities are organized as 
dark networks to maintain their operations 
and activities covered (17, 18). However, their 
human losses caused by homicidal violence 
and the state’s action through incapacitation 
provide insights into the overall amounts of 
such activities. We leverage the trends in homi- 
cides, missing persons, and incarcerations over 
the past decade to motivate our investiga- 
tion of cartels’ sizes in Mexico (supplementary 
materials, section A). Not all losses are directly 
related to the conflict between cartels (e.g., 
domestic violence), and some are a by-product 
of their dispute (e.g., deaths suffered by family 
members or bystanders). To study the size and 
evolution of the cartel population, we exclu- 
sively model homicides between cartel mem- 
bers (i.e., homicides in which the victim and 
the perpetrator are both cartel members). The 
starting point is that cartels have not seen their 
power diminished because violence has not re- 
duced either. In Mexico, 686 people were killed 
each week of 2021, with an additional 137 people 
reported as missing and yet to be found, and 
more than 2500 people were imprisoned each 
week (3, 19, 20). 

We use the number of cartel losses to infer 
otherwise unknown properties, including their 
size and recruitment rate. Data compiled from 
open sources by the Programa de Politica de 
Drogas (PPD) (27) enable us to detect the ex- 
istence of « = 150 active cartels in Mexico in 
2020. Building on such data, we operationally 
define cartels as those criminal organizations 
that are found to be active in Mexico, regard- 
less of their size and activity (supplementary 
materials, section B). Cartels have different 
interactions: They can be allies, they can have 
no interactions (particularly from distant lo- 
cations), or they can fight for territory or re- 
sources, creating substantial losses among 
both groups. To represent these interdepen- 
dencies, we construct two separate weighted 
networks—the allies A and rivalries R—to 
recreate conflicting and cooperating cartels, 
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rivalries 


L * Sinaloa 


alliances 


XN 


Fig. 2. Rivalries and alliances were observed between 150 active cartels in Mexico in 2020. The size 
of the node represents the estimated cartel size. If cartels have at least one state rivalry, nodes are 
connected (left). The width of the edge corresponds to the number of states in which cartels fight. Nodes are 
connected if they are identified as allies (right). NF Mich., Nueva Familia Michoacana; UTepito, Union Tepito; 


Z, Los Zetas; SRdL, Santa Rosa de Lima. 


with weights corresponding to the number of 
states in which two cartels interact (Fig. 2). 
Major cartels, such as Cartel Jalisco Nueva 
Generacién (CJNG), the Sinaloa Cartel, and 
Nueva Familia Michoacana, are present al- 
most at a national level and have alliances with 
many satellite organizations forming three main 
clusters. These clusters fight against each other, 
creating most of the violence between cartels 
(6). Smaller organizations are local to one city 
and have few interactions (cooperation or con- 
flict) with other cartels. 

The number of members of cartel 7 at time ¢, 
expressed as C,¢), increases instantly accord- 
ing to pC; where p is the fixed recruitment rate. 
Because of state forces, the size of the cartel 
decreases by nC;/>_,C; for some n > 0 that 
represents the incapacitation rate. Because 
of internal instability, dropouts, and dimin- 
ishing returns, large groups decrease their 
size instantly by oC? for some small value of 
@ > 0, known as the saturation rate (22, 23). 
The impact of conflict between two cartels, z 
and j, is modeled according to the number of 
homicide offenders between rival groups, 
which is assumed to be proportional to car- 
tel size, so cartel 7 suffers instant casualties 
according to 8C;,C;, where 6 = 0 is the deathly 
rate of conflict related to homicide offend- 
ers within cartels. Combining recruitment, 
incarceration, conflict, and saturation, we 
obtain 

; C; us 5 

Ci= pCi 1G - ®) CiGSy—oC? (1) 

seen “ Jal aed 


t ae saturation 
incapacitaion ~~ —— 
conflict 


where C; indicates the rate of change in cartel 
size 7, and S; = 0 captures the interaction be- 
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tween cartels. We obtain a system of « = 150 
coupled differential equations—one for each ‘ 
cartel (supplementary materials, section C). 
The number of weekly casualties produced by 
all cartels is given by d(t) = @C’SC, where 
C = (C,C,...,C,). Cartels recruit pC individuals, ‘ 
where C = $°C;, and i(t) = a are inca- 
pacitated. In line with previous works on other 
types of organizations, we assume that the ini- 
tial cartel size is a heavy-tailed distribution (sup- 
plementary materials, section D) (24-26). We 
use the observed weekly number of casualties 
and incapacitations to estimate the time-varying 
number of cartel members C‘). 


Not all observed deaths, missing persons, + 
and incapacitations in the country are suffered ‘ 


by cartel members, and most incapacitations 
are not linked to the incarcerations of cartel 


members. In our analysis, we estimate casual- - 


ties as the sum of missing persons with mur- 
ders and consider that a fraction f = 10% of 
the observed weekly deaths and a fraction g = 
5% of the incapacitations are cartel members 
(supplementary materials, section D). In total, 
50,000 casualties and 55,000 incapacitations 
directly involve cartel members. On the basis 
of these figures, we estimate that in 2012, there 
were 115,000 cartel members and that by 2022, 
the number increased to 175,000. Thus, despite 
efforts from the state to hinder their power, car- 
tels have increased their size by 60,000 members 
in a decade. Incarcerating nearly 6000 cartel 
members each year has not prevented them 
from growing into larger organizations. Given 
the current conditions, we quantify 120 weekly 
cartel-related deaths, with an increase of 77% 
between 2012 and 2022. To ensure that our 
results are not driven by wrong assumptions 


20f5 


RESEARCH | RESEARCH ARTICLE 


about the number of homicides between cartel 
members and incarcerations of cartel affili- 
ates, we conduct sensitivity tests considering 
the scenarios between 40,000 and 60,000 car- 
tel casualties and 45,000 and 65,000 incapa- 
citations. By considering the variation of these 
two parameters, we obtain that the total pop- 
ulation of cartel members in 2022 lies between 
160,000 and 185,000 units. At the same time, 
additional sensitivity tests were used to try to 
quantify the effect of potential missing data at 
the network level concerning alliances and 
rivalries. Adding 10% more cartels would, on 
average, lead to 3.2% more members than the 
estimated 175,000. Furthermore, we also pro- 
vide evidence that adding 10% more alliances 
or rivalries would at most affect the overall 
dimension of violence by 5% (supplementary 
materials, section E). Even under a conserva- 
tive scenario, Mexican cartels have lost around 
200 members per week for years (Fig. 3A). Spe- 
cifically, we estimate that in a decade, 285,000 
people acted as cartel units and that—in total— 
37% of them are either deceased (17%) or in- 
carcerated (20%). 

Despite competition with other cartels and 
state forces’ incapacitation, cartels have prevailed 
for decades. Between January and December 
of 2021, cartels recruited 19,300 individuals, 
losing 6500 members as a result of conflict 
with other cartels and 5700 members as a 
result of incapacitation, which resulted in a 
net gain of roughly 7000 members during that 
year (supplementary materials, section D). A 
similar estimate is observed for each year be- 
tween 2012 and 2022. Unless all cartels com- 


2022 


years 


bined recruit between 350 and 370 people 
per week, they would have collapsed as a re- 
sult of conflict, incapacitation, and saturation 
combined (Fig. 3A). 

Given the estimated overall population, all 
cartels combined are the fifth largest employer 
in Mexico (27) (Fig. 3B). The 10 largest cartels 
in Mexico have more than 50% of the active 
affiliates in the country, but the conflict be- 
tween them only produces 15% of the fatalities 
(Fig. 3C). Most cartels are small local organ- 
izations playing a critical role in creating vi- 
olence in the country, often becoming targets 
of more powerful organizations. Previous re- 
search has suggested that large cartels fre- 
quently adopt fragmented cells of other weaker 
and less experienced structures (16). Small car- 
tels play a crucial role because they are more 
likely to become targets of powerful illicit 
organizations rather than fighting organ- 
izations of similar sizes. We estimate that 
more than half of the country’s casualties 
result from the fight between the smallest 
140 and the largest 10 cartels (supplemen- 
tary materials, section B). 


RQ2: Comparing policy scenarios 


On the basis of the size of cartels in 2022 and 
the trends observed in the past decade, we 
predict that the weekly number of casualties 
related to organized crime will keep increasing 
in the coming years. We estimate that if current 
trends continue, cartels will keep increasing 
their power, and we could observe 40% more 
casualties and 26% more cartel members by 
2027. We test the effectiveness of two main pol- 


FemsaL____ 327,000] 
Walmartl_____ 231,000] 
Manpower|___203,000] 
America MovilL___ 181,000] 
Cartels[ZSIO00N- 
Oxxol____168,000] 
Bimbo[__138,000] 
Pemex{_124,000] 
Coppell_114,000] 


Grupo Salinas|1 


17.9% CJNG 
8.9% Sinaloa 
6.2% NF Michoacana 
| 14.5% Noreste 
| 13.5% Union Tepito 


Fig. 3. Current size of cartels and career paths for recruited members. (A) Between 2012 and 2022, we 
estimate that 285,000 people took part as cartel members, but only 60% were still active by 2022. The 
cartel career is brief and risky. Roughly 17% of them are dead, and 20% are incapacitated. (B) Number 

of employees from the top 10 companies in Mexico and the combined size of cartels (27). We estimate that 
cartels had between 160,000 and 185,000 members combined. (C) Of the 175,000 active cartel members, 
roughly 17.9% are part of CJNG, 8.9% are part of Cartel de Sinaloa, and 6.2% are from Nueva Familia 


Michoacana—the top three cartels in size. 
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icy scenarios designed to reduce future violence 
in the country: first, a preventive strategy aimed 
at reducing cartel recruitment, and second, a 
reactive strategy aimed at increasing incapac- 
itation. On the one side, doubling incapacitation, 
with all of the associated costs and challenges 
in increasing security resources (including po- 
lice personnel, army, prisons, etc.), will still 
result in an increase of 8% in the number of 
casualties and an increase of 6% in the num- 
ber of cartel members. Even doubling incar- 
cerations will translate to a rise in violence 
(Fig. 4). Cartels have a critical equilibrium where 
their recruitment compensates for their losses, 
maintaining a stable size. Yet, if the recruitment 
rate of a cartel is 10% above its equilibrium, 
the incapacitation rate must increase by more 
than 21% to dismantle it (supplementary ma- 
terials, section F). 

Conversely, decreasing the cartel’s ability to 
recruit by half will reduce the weekly casualties 
by 2027 by 25% and cartel size by 11%. Math- 
ematically, a preventive strategy is far more 
successful than a traditional reactive strategy. 
However, the cartel population is so large that, 
even in the hypothetical scenario where recruit- 
ment drops to zero, it would take 3 years to 
return to the—already high—levels of violence 
observed in 2012. This further calls for rapid 
and timely large-scale initiatives to reduce re- 
cruitment in the country. 

We also assess the effects of two additional 
ancillary policy scenarios. The first one is de- 
signed to alter the type of conflict between car- 
tels (e.g., pushing for a narcopeace), and the 
second one is targeted at modifying cartels’ 
saturation levels (i.e., making cartels more prone 
to fragmentation). Neither of the two strat- 
egies outperforms the positive effects that a 
reduction in recruitment could produce (sup- 
plementary materials, section E). Decreasing 
the conflict by 20% reduces the number of ca- 
sualties by 8.7%, whereas increasing satura- 
tion by 20% lowers the number of homicides 
between cartel members by 5.4% (supplemen- 
tary materials, section E). In light of the cur- 
rent estimated circumstances, the growth of 
cartels’ size is impeded mainly by the conflict 
existing among organizations rather than the 
ability of the state to reduce the levels of vi- 
olence in Mexico successfully. 


Discussion 


For the past 15 years, Mexico has suffered from 
staggering levels of violence. Most of the vi- 
olence has been perpetrated by cartels fighting 
against each other (4). Despite the relevance of 
cartels, we lack basic information on their size 
and the impact of different policies that seek 
to curb their power. To the best of our knowl- 
edge, this work represents the first scholarly 
attempt to mathematically quantify the size of 
the cartel population in Mexico and to compare 
policy scenarios intended to decrease violence 
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Fig. 4. Forecast of the number 
of casualties and cartel size 
according to four different 
strategies. Weekly cartel-related 
deaths (top) and cartel size 
(bottom) if trends continue, if 
incapacitation doubles, if 
recruitment is reduced by half, 
and if recruitment is reduced 

to zero. Estimates for 2027 are 
obtained by keeping the 2022 
estimates and adjusting the 
corresponding values of 
incapacitation or recruitment. 


2012 


2012 


in the country. Overall, our work advances the 
growing literature on mathematical and sta- 
tistical simulations for studying complex crim- 
inal phenomena (28-30). 

Our simulations yield some key findings. We 
estimated that the cartel population counted 
160,000 to 185,000 units by 2022 and that, over 
the 2012 to 2022 period, 285,000 people acted as 
cartel members. Given these figures, we showed 
that in 2022, cartels needed to recruit between 
350 and 370 units per week to avoid collapse 
as a result of joint effects of conflict, incapacita- 
tion, and saturation. Furthermore, we assessed 
the effectiveness of two main scenarios to curb 
cartels’ violence: preventive (intended to pre- 
vent recruitment) and reactive (designed to in- 
crease incapacitation through incarcerations). 
If current levels of incapacitation are doubled, 
some violence will be contained, but we would 
still expect an increase in the weekly casualties. 
Conversely, reducing recruitment by half leads 
to a decrease in homicides of 25%. We also tested 
the effect of two ancillary scenarios—reducing 
the conflict by pushing for cartel agreement 
and fragmentation, intended to decrease car- 
tels’ power through internal fights (supplemen- 
tary materials, section E). Results showed that 
the preventive strategy remained substantially 
more effective in reducing violence in the coun- 
try. Tackling recruitment will have a triple effect 
in the future: First, it will lower the number of 
cartel members, reducing the violence that it 
can create by having fewer killers. Second, it 
will lower the number of targets, so fewer people 
are vulnerable to suffering more violence. And 
third, it will reduce the cartel’s capacity for fu- 
ture recruitment. 

Although offering policy recommendations 
is beyond the scope of this work, our results 
can prompt policy-related reflections. Many 
initiatives to counter organized crime aim to 
increase incapacitation through incarceration. 
In this work, we demonstrate how increasing 
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incapacitation substantially may not necessarily 
reduce violence. Contrarily, we offer an alterna- 
tive scenario centered around reducing recruit- 
ment and suggest how it may have longer-lasting 
beneficial effects. More than 1.7 million people 
in Latin America are incarcerated, and adding 
more people to saturated jails will not solve the 
insecurity problem (37). 

Despite the contributions of this investigation, 
there were some limitations. First and foremost, 
although the lack of data on the size of cartels 
represents the inherent motivation of this work, 
it also represents a structural limitation because 
our estimates cannot be meaningfully validated 
with real-world information. We took all possi- 
ble precautions to obtain statistically consistent 
estimates through extensive sensitivity analyses, 
but this does not eliminate the core validation 
issue. Additionally, a thorough reflection on 
other sources of limitation and assumptions is 
provided in the supplementary materials, sec- 
tion I. These entail (i) temporal variability in 
rivalries and alliances, (ii) alternative sources 
of cartels’ size variability, and (iii) the lack of a 
finite population. 

Results highlight the need to devote more 
attention to recruitment. Reducing recruit- 
ment requires structural efforts at the state 
and local levels. This especially applies to areas 
with high cartel support, where offering edu- 
cational and professional opportunities that 
outweigh the short-term benefits offered by 
cartels represents a critical goal for the future 
of the country (32-35). Future work on this 
topic should focus on enriching our model of 
cartel size variation with additional sources, 
such as cartel fragmentation, and should also 
consider the possibility of studying recruit- 
ment dynamics using data on finite populations 
to obtain mathematical models that consider 
individual risk factors (such as age and sex) 
in the computation of violence and recruit- 
ment trends. 
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Manipulating mitochondrial electron flow enhances 


tumor immunogenicity 
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Bianca Parisi, Mercedes Rincon’, Matthew G. Vander Heiden*°, Marcus Bosenberg®, Diana C. Hargreaves’, 


Susan M. Kaech"*, Gerald S. Shadel!* 


Although tumor growth requires the mitochondrial electron transport chain (ETC), the relative 
contribution of complex | (Cl) and complex II (Cll), the gatekeepers for initiating electron flow, remains 
unclear. In this work, we report that the loss of Cll, but not that of Cl, reduces melanoma tumor 
growth by increasing antigen presentation and T cell-mediated killing. This is driven by succinate- 
mediated transcriptional and epigenetic activation of major histocompatibility complex—antigen 
processing and presentation (MHC-APP) genes independent of interferon signaling. Furthermore, 
knockout of methylation-controlled J protein (MCJ), to promote electron entry preferentially through 
Cl, provides proof of concept of ETC rewiring to achieve antitumor responses without side effects 
associated with an overall reduction in mitochondrial respiration in noncancer cells. Our results may 

hold therapeutic potential for tumors that have reduced MHC-APP expression, a common mechanism of 


cancer immunoevasion. 


he mitochondrial tricarboxylic acid (TCA) 
cycle and electron transport chain (ETC) 
provide metabolic plasticity required for 
cancer growth and progression (J, 2). The 
ETC comprises four multisubunit com- 
plexes: complexes I to IV (CI to CIV). CI [re- 
duced form of NAD* (NADH) dehydrogenase] 
and CII [succinate dehydrogenase (SDH)] are 
the gatekeepers of electron flow by passing 
electrons from TCA-generated NADH and flavin 
adenine dinucleotide (FADH,), respectively, to 
ubiquinone for delivery to CII and finally to 
oxygen via CIV. Recycling of ubiquinone by CII 
is required for continuous flow of electrons 
through the ETC initiated by CI or CII and is 
essential for tumor growth (3-5). However, in- 
dividual loss-of-function mutations in CI or CII 
subunits are tolerated in cancer cells through 
metabolic adaptations that are incompletely 
understood (6-13). Thus, in this work, we en- 
deavored to learn more about the individual 
contributions of CI and CII activity to tumor 
growth, metabolism, and immunogenicity. 
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Reducing mitochondrial Cll activity enhances 
antitumor immune responses 

To investigate the contributions of CI and 
CII to tumor growth and antitumor immune 
responses, we generated isogenic CI (sgNdujfal) 
or CII (sgSdha or sgSdhe) knockout YUMM1.7 
(Brat /Pten!-/Cdkn2a'~) mouse melanoma 
cells (14) (fig. S1A) and implanted them into 
immune-competent mice. Knockout of CI or CII 
subunits reduced the respective complexes as 
expected but did not affect the abundance of 
other ETC complexes (fig. S1B). Loss of CI or CII 
also significantly reduced oxygen consumption, 
spare respiratory capacity, and cell proliferation 
(fig. S1, C to E). However, CI knockout cells had 
lower respiration and proliferation compared 
with CII knockout cells, consistent with CI 
being the major source of electrons to the ETC in 
YUMML7 cells under these culture growth con- 
ditions. Knockout of CI increased NADH levels 
without affecting the CII substrate succinate, 
which suggests that CI knockout cells have intact 
ClI-dependent respiration and succinate dehy- 
drogenase activity (fig. S1, F and G). Conversely, 
knockout of CII increased succinate without af- 
fecting NADH levels, indicating that CI respira- 
tion is intact in CII knockout cells (fig. S1, F and 
G) but that ClI-dependent respiration and suc- 
cinate dehydrogenase activity are absent. We next 
engrafted CI or CII knockout cells into syngeneic, 
wild-type C57BL/6 mice. Unexpectedly, CI and 
CII knockout had notably differential effects on 
tumor growth. Despite proliferating slower in 
vitro compared with CII knockout (sgSdha) and 
control (sgSCR) cells, CI knockout (sgNdujfal) 
tumors did not show any growth defects in vivo, 
which suggests that CI is not required for YUMM1.7 
tumor growth (Fig. 1, A and B). However, CII 
knockout tumors grew significantly slower 
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q 


compared with control and CI knockou ie 
mors. Flow cytometry analysis of CII knocl..-— 
tumors showed significantly increased im- 
mune cell (CD45"*) infiltration, especially CD8* 
T cells, relative to control and CI knockout 
tumors (Fig. 1, C and D). No significant change 
in the number of CD4* T cells or the percentage 
of regulatory T cells (Tyegs) was observed (fig. 
S2, A and B). Consistent with this, CD8" T cells 
from CII knockout tumors produced more 
interferon-y (IFN-y*) and granzyme-B (GZMB’), 
which suggests that significant tumor-killing 
effector function was responsible for the ob- 
served antitumor activity (Fig. 1E). This was 
further confirmed by performing the same 
experiment in Ragi-deficient mice that do 
not have mature T cells or B cells, in which the 
antitumor phenotype of CII knockout was lost 
(fig. S2, C and D). Because antigen presenta- 
tion by major histocompatibility complex class 
I (MHC-D is a major determinant of T cell ac- 
tivation and killing, we next measured MHC-I 
expression on CI- and ClI-deficient tumor 
cells. We found significantly higher MHC-I 
expression on CII knockout tumor cells com- 
pared with CI knockout and control tumor 
cells in vivo (Fig. IF). Furthermore, ablation ‘ 
of antigen presentation by knocking out 2- 
microglobulin (MHC-I light chain, sgB2m) in 
CI knockout cells confirmed that tumor anti- 
gen presentation is required for the antitumor ‘ 
effect of CII depletion (Fig. 1,G and H, and 
fig. S2E). Consistent with these data, CIBERSORT 
correlation analysis of pan-cancer datasets 
showed a negative correlation between the ex- 
pression of CII genes (SDHA, SDHB, SDHC, and 
SDHD) and a cytotoxic T cell gene signature 
across multiple cancer types (33/36 cancer 
types) (fig. S3A). Finally, low SDHC (CII subunit) 
expressing human breast and skin tumors also 
showed increased expression of cytotoxic Tlym- + 
phocyte (CTL) marker genes (CD8A, CD8B, GZMA, ‘ 
GZMB, IFNG, and PFRI) (fig. S3, B and C, and 
table S1). Thus, we conclude that the loss of CII, 
but not of CI, results in a strong antitumor T cell - 
response through increased antigen presentation. 
Seemingly contrary to our findings, loss of 
CII function (i.e., SDHx subunit gene mutations) 
can be tumorigenic in humans, and succinate, 
which accumulates under these conditions, has 
been coined an “oncometabolite” (15, 16). Most 
of the oncogenic SDHx mutations are germline 
and result in a subset of rare cancers, such as 
pheochromocytoma, paraganglioma, and gastro- 
intestinal stromal tumors (15). Thus, in these 
cases, CII deficiency and succinate accumula- 
tion would be present from the beginning and 
could promote early events in tumor initiation. 
Our results, by contrast, show that the deple- 
tion of CII reduces tumor growth by enhancing 
antigen presentation; thus, the effect is not on 
tumor initiation but rather on tumor growth 
resulting from immune system attack. It is also 
important to note that the loss of CII in mouse 
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Fig. 1. Mitochondrial Cll inhibition enhances antitumor immunity through 
increased MHC-I. YUMMI1.7-sgSCR (control) (n = 8), sgSdha (CII knockout) 

(n = 8), and sgNdufal (Cl knockout) (n = 7) cells (2 x 10°) were subcutaneously 
injected into the flanks of C57BL/6 male mice, and tumor growth was monitored 
for 20 days. (A to D) Tumor growth curves: tumor volume versus time (days 
postimplantation) (A), tumor weight (in grams at day 20) (B), number of tumor- 
infiltrating CD45* cells per gram of tumor at day 20 (C), and CD8* T cells 

per gram of tumor at day 20 (D). (E) Percentage of IFN-y* and GZMB* positive 
tumor-infiltrating CD8* T cells in tumors at day 20. (F) Tumor surface MHC-I 


models does not cause spontaneous tumor 
formation unless other oncogenic conditions 
are present (17-27). Additionally, inherited on- 
cogenic CII mutations usually affect neuroen- 
docrine tissues within specialized physiological 
environments and/or can promote specific tumor 
microenvironments that are conditionally tu- 
morigenic when CII activity is inhibited and/or 
in combination with other oncogenic genetic 
alterations (/8, 22-24). Finally, it is possible 
that the immune pressure we have uncov- 
ered that results from CII deficiency leads to 
immunoediting, promoting the selection of 
tumor clones with immunosuppressive prop- 
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erties that escape immune detection and con- 
tinue to grow. 


Mitochondrial succinate increases tumor antigen 
presentation independently of IFN signaling 


Next, to better understand the antitumor re- 
sponse to CII deficiency, we explored the nature 
of the signal underlying the increased tumor 
cell antigen presentation. We first determined 
that depletion of CII increased cell surface ex- 
pression of MHC-I in YUMM1.7 cells in vitro 
(Fig. 2A). Notably, pharmacological inhibi- 
tion of CII by 3-nitropropionic acid (3-NPA), 
but not CI inhibition by rotenone, was suf- 
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Tumor Weight (g) 


expression on cells from (A) at day 20. (G and H) Tumor growth curve 

(tumor volume versus time is plotted) (G) and tumor weight (in grams at day 21) 
(H) of tumors from sgSCR (control), sgSdha (Cll knockout), sgSCR-sgB2m 
(control + 82 microglobulin knockout), and sgSdha-sgB2m (Cll knockout + B2 
microglobulin knockout) YUMML.7 cells (2 x 10°) subcutaneously implanted 

in C57BL/6 mice (n = 4). Data points in each panel represent independent 
samples from two experiments. Data are plotted as means + SEMs. Statistical 
significance was determined by Kruskal-Wallis test with Dunn's multiple 
comparisons test for (A) to (H). 


ficient to increase MHC-I expression in vitro in 
multiple murine cell lines YUMML7, 4T1, B16- 
F10, YUMMER, MC38, and fibroblasts; Fig. 2B 
and fig. S4, A and B). Inhibition of CII, but not 
of CI, increased the transcripts of several 
major histocompatibility complex-antigen pro- 
cessing and presentation (MHC-APP) genes in 
YUMM17 cells (Fig. 2C and fig. S4C). Similar 
results were obtained with 3-NPA-treated 4T1 
mouse breast cancer cells (fig. S4D). Consistent 
with this, inhibition of CII in YUMM1.7 cells 
expressing ovalbumin (OVA), a model antigen, 
increased the presentation of the OVA-derived 
peptide SIINFEKL bound to MHC-I (H-2K°), 
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inhibition of Cll. (A) Cell surface MHC-| expression on sgSCR, sgSdha, and sgSdhc 
YUMML/7 cells (n = 6). MHC-I expression is presented as fold change relative to sgSCR 
cells. A representative histogram is shown on the left. (B) Cell surface MHC-I expression 
on YUMM1L7 (n = 6) and 4T1 (n = 6) cells treated with DMSO (vehicle control), rotenone 
(Cl inhibitor), or 3-NPA (Cll inhibitor) for 48 hours. MHC-| expression is presented as 
fold change relative to DMSO-treated cells. A representative histogram is shown on the 
left. (C) Reverse-transcription quantitative polymerase chain reaction (RT-qPCR) analysis 
of indicated representative MHC-APP genes in YUMMI17 cells treated with DMSO, 
rotenone, and 3-NPA for 48 hours. Here, B2m is MHC-| light-chain; H2-d and H2-k1 
represent MHC-| heavy-chain; Lmp2 and Lmp7 represent immunoproteasome subunits; 
Tapl, Tap2, Tapbp, and Tapbp! represent antigen transporters and peptide loading 
complex in the endoplasmic reticulum; and Nircd, Irfl, and Statl are transcription factors. 
Expression levels are presented as fold change relative to DMSO-treated cells. Each data 
point represents a technical replicate from one biological sample. Similar results were 
obtained with two independent biological replicates. (D) Cell surface MHC-I expression on 
DMSO- and 3-NPA-treated sgNirc5 (NLRC5 knockout; n = 4) and sgirfl (IRF1 
knockout; n = 4) YUMM1,7 cells. MHC-| expression is presented as fold change 
relative to DMSO-treated sgSCR (control) cells. (E) Steady-state levels of metabolites in 
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scale depicts the abundance of the metabolites (red, high; blue, low) (n = 2 biologically 
independent experiments). (F) Cell surface MHC-I expression on YUMML7 (n = 3) 

and 4T1 (n = 3) cells treated with mono-methyl succinate (a cell-permeable form of 
succinate) dissolved in phosphate-buffered saline (PBS) for 48 hours. MHC-| expression is 
presented as fold change relative to vehicle (PBS) control cells. (G) Whole-cell succinate 
levels in YUMML/7 cells cultured in the presence or absence of glutamine for 16 hours, 
followed by DMSO or 3-NPA treatment for 24 hours (n = 3). (H) Cell surface MHC-| 
expression on YUMMIL7 cells cultured in the presence or absence of glutamine for 

16 hours followed by DMSO or 3-NPA treatment for 24 hours (n = 4). MHC-I expression 
is presented as fold change relative to DMSO-treated cells cultured in the presence of 
glutamine. (I) Cell surface MHC-| expression on sgSCR and sgSdhc (Cll knockout) 
YUMML.7 cells transfected with either siSCR or siOgdh for 72 hours (n = 3). MHC-| 
expression is presented as fold change relative to cells transfected with siSCR (sgSCR). 
Data points in each panel represent an independent sample, unless otherwise specified. 
Data are plotted as means + SDs. Statistical significance was determined using 
one-way analysis of variance (ANOVA) with Dunnett's multiple comparisons test for (A), 
(B), (D), (F), (G), and (H); two-way ANOVA with Dunnett's multiple comparisons 

test for (C); and two-way ANOVA with Sidak's multiple comparisons test for (1). 
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Fig. 3. Mitochondrial succinate regulates antigen presentation through 
changes in histone methylation. (A) The aKG/succinate ratio in YUMM1.7 
cells treated with DMSO, 3-NPA, and 3-NPA + aKG for 6 hours (n = 4). 

(B) Immunoblot analysis of indicated lysine trimethylation marks on histone 3 in 
1.7 cells treated with DMSO, 3-NPA, and 3-NPA + aKG for 24 hours. 
Histone 3 and ACTIN are the loading controls. Numbers represents band density 
normalized to histone 3. Similar results were obtained with an independent 
experiment. (C) Cell surface MHC-I expression on YUMML.7 cells treated with 
DMSO, 3-NPA, and 3-NPA + aKG for 48 hours (n = 6). MHC-I| expression is 
presented as fold change relative to DMSO-treated cells. (D) Genome-wide 
distribution profiles of H3K4me3 (top) and H3K36me3 (bottom) binding based 
on ChIP-seq reads in YUMM1.7 cells treated with DMSO, 3-NPA, and 3-NPA + 
aKG for 24 hours. (E) Heatmap representation of H3K4me3 enrichment intensity 
based on ChIP-seq reads in YUMML.7 cells treated with DMSO, 3-NPA, and 
3-NPA + oKG for 24 hours. Signals within three kilobases of the transcription start 
site (TSS) are displayed in descending order for each cluster (i.¢., gained, 
maintained, and lost in response to 3-NPA). (F) Heatmap representation of 
H3K36me3 enrichment intensity based on ChIP-seq reads in YUMMIL.7 cells 
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confirming enhanced APP upon CII inhibition 
(fig. S4E). To identify the molecular mecha- 
nism by which mitochondrial CII inhibition 
drives the expression of nuclear MHC-APP genes, 
we performed gene expression profiling on CI 
and CII inhibitor-treated YUMM1.7 cells (fig. 
S5A and tables $2, $3, and S4). CII-inhibited 
cells showed significant enrichment of the IFN 
response pathway, including APP genes (fig. S5A 
and tables S5 and S6). Therefore, we tested 
whether CII inhibition induces MHC-I through 
autocrine IFN-mediated signaling by knock- 
ing out [fngrl or Stat] in YUMM1.7 cells (fig. S5B). 
Although IFN-y-induced MHC-I up-regulation 
was eliminated in Jfngr1 and StatI knockout 
cells, CH inhibition still led to the induction of 
surface MHC-I and expression of APP genes in 
these cells (fig. S5, B to F). Similarly, CII in- 
hibition did not induce signal transducers and 
activators of transcription 1 (STAT1) phosphoryl- 
ation at Y701, which would be expected from 
receptor-mediated activation of IFN signaling 
(fig. S5G). Next, we tested whether CII inhibition- 
induced MHC-APP genes required NLRC5 or 
IRF1—two known transcriptional activators of 
these genes. Notably, depletion of NLRC5, but 
not of IRF1, attenuated CII inhibition-induced 
cell surface MHC-I and expression of MHC-APP 
genes (Fig. 2D and fig. S85, H and I). Together, 
these findings show that although IFN signaling 
is not required, there remains a partial dependence 
on NLRC5-mediated transcription to induce 
MHC-APP genes in response to CII inhibition. 

Because CII is succinate dehydrogenase and 
succinate influences nuclear gene expression (2), 
we investigated whether CII inhibition drives 
MHC-APP gene expression by promoting suc- 
cinate accumulation. Four pieces of evidence 
indicate that this is the case. First, as predicted, 
Cl-inhibited cells had high levels of succinate 
(Fig. 2E and fig. S6A). Second, treating wild- 
type (i.e. Cll-competent) cells with cell-permeable 
succinate is sufficient to increase both cell sur- 
face MHC-I and expression of MHC-APP genes 
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(Fig. 2F and fig. S6, B to D). Third, because glu- 
tamine is the primary source of succinate in 
Cl-inhibited cells (7), glutamine starvation of 
YUMML’ cells significantly reduced the 3-NPA- 
induced succinate accumulation and conco- 
mitant expression of cell surface MHC-I and 
MHC-APP genes (Fig. 2, G and H, and fig. S6E). 
Lastly, the knockdown of OGDH, a subunit of 
2-oxoglutarate dehydrogenase, in CII knockout 
(sgSdhc) cells significantly reduced succinate 
levels and MHC-I expression (Fig. 21 and fig. S6, 
F to H). Consistent with this, an inverse correla- 
tion was identified between the expression of 
CI (especially SDHC) and MHC-APP encoding 
genes in the cancer cell line encyclopedia (CCLE) 
(fig. S7A). Furthermore, analysis of human breast 
and skin cancer samples from tumor sequencing 
studies identified significant down-regulation 
of MHC-APP genes in samples with high SDHC 
expression (fig. S7, B and C, and table S1). 
Therefore, CII inhibition elevates intracellular 
succinate that drives increased transcription 
of MHC-APP genes and antigen presentation 
in both mouse and human cancer cells. Many 
cancer cells escape the immune system by down- 
regulating MHC-APP expression or becoming 
unresponsive to the IFN-y that activates these 
genes (25). Thus, our results that mitochondrial 
CII inhibition can up-regulate MHC-APP through 
succinate accumulation independently of IFN 
signaling have therapeutic implications. 


Mitochondrial succinate increases MHC-APP 
gene transcription by inhibiting histone 
demethylases and modulating the tumor 
epigenetic landscape 


Because intracellular o-ketoglutarate (aKG)/ 
succinate ratio is an important determinant 
of the enzymatic activity of 2-oxoglutarate- 
dependent dioxygenases (2-OGDDs), including 
ten-eleven translocation (TET) family mem- 
bers and lysine-specific demethylases (KDMs) 
(2, 26, 27), we explored whether inhibition of 
CII decreases the aKG/succinate ratio, thereby 
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treated with DMSO, 3-NPA, and 3-NPA + oKG for 24 hours. Signals within two 
kilobases around the gene body [TSS to TES (transcription end site)] are 
displayed in descending order for each cluster (i.e., gained and maintained in 
response to 3-NPA treatment). (G) Volcano plots showing differentially enriched 
genes for H3K4me3 (top) and H3K36me3 (bottom) modifications from ChIP-seq 
data comparing DMSO- and 3-NPA-treated YUMML1.7 cells (P < 0.0001 and 
fold change > 1.25). Peaks enriched for MHC-APP genes are depicted in blue. 
(H) Bubble plot showing fold change of H3K4me3 enrichment on promoters 

of representative MHC-APP genes from ChIP-seq dataset comparing DMSO- and 
3-NPA-treated YUMMI1.7 cells. Color gradient depicts the logig (P value). 

(I) Genome browser tracks for H3K4me3 marks at Nirc5, Psmb9, Tap1, Psmb8, 
and Tap2 loci in ChIP-seq. Boxes indicate significantly enriched peaks (P < 
0.0001 and fold change > 1.25) at sites of interest. (J) Genome browser track 
for H3K36me3 at Tap1 gene body in ChIP-seq. Box indicates significantly 
enriched peaks (P < 0.0001 and fold change > 1.25) at sites of interest. Data 
points in each panel represent an independent sample. Data are plotted 

as means + SDs. Statistical significance was determined by one-way ANOVA with 
Dunnett's multiple comparisons test for (A) and (C). 


lowering 2-OGDD activity. Inhibition of CII or ° 
addition of cell-permeable succinate to YUMM1.7 
cells significantly reduced the intracellular oKG/ 
succinate ratio and increased trimethylation of 
several key lysine residues of histone H3 that are 
often associated with transcription regulation 
(fig. S8, A to D). Notably, cell-permeable «KG treat- 
ment of ClJ-inhibited (3-NPA) cells increased the 
aKG/succinate ratio and reversed H3 trimeth- 
ylation, cell surface MHC-I, and expression of 
MHC-APP genes (Fig. 3, A to C, and fig. S8, Eand 
F), consistent with succinate-mediated inhibition 
of 2-OGDDs being a key downstream effect of 
CI inhibition. Because treatment of CII-inhibited 
YUMML17 cells with 5-azacytidine (DNA methyl- 
transferase inhibitor) did not prevent the increase 
in MHC-I expression but rather increased it fur- 
ther (fig. S8G), we conclude that inhibition of 
TET DNA demethylase activity by succinate is 
likely not a major contributor to CII inhibition- 
mediated MHC-APP expression. 

The state of histone methylation is determined 
by the relative activities of KDMs and histone 
methyltransferases (fig. SQA). Succinate-mediated 


inhibition of KDMs shifts the balance toward - 


increased histone methylation; therefore, we 
inhibited KDMs to determine whether they are 
involved in CII inhibition/succinate-mediated in- 
creases in MHC-APP. We found that inhibition 
of the KDM5 family (H3K4me3 demethylases) 
by KDM5-C70 increased MHC-I expression com- 
parably to CII inhibition, accompanied by in- 
creased levels of H3K4me3 in YUMM1.7 cells 
(fig. S9, B and C). Next, we performed knock- 
down of the histone methyltransferases specific 
to H3K4me3 (KMT2A and KMT2B), H3K36me3 
(SETD2), and H3K27me3 (EZH2) in CI knockout 
(sgSdhc) cells to restore the respective histone 
methylation (fig. S9D). Although knockdown of 
these did not affect the IFN-y-induced increase 
in MHC-I, knockdown of KMT2A and SETD2 
reversed the effects of increased succinate on ex- 
pression of surface MHC-I and MHC-APP genes 
(fig. S9, E to H). Thus, H3K4me3 and H3K36me3 
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Fig. 4. Knockout of the mitochondrial Cl inhibitor MCJ rewires the ETC to increase tumor immunogenicity without reducing OXPHOS: A therapeutic proof of 
concept. (A) Relative succinate levels in sgSCR (control) and sgMcj (Mcj knockout) YUMM1.7 cells. Data are presented as fold change relative to sgSCR cells. (B) Cell 
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surface MHC-I expression on sgSCR and sgMcj YUMM1.7 cells (n = 10). 

HC-I expression is presented as fold change relative to sgSCR cells. 

(C) RT-qPCR analysis of indicated representative MHC-APP genes in sgSCR 
and sgMcj YUMM1.7 cells. Expression levels are presented as fold change 
elative to sgSCR cells. Each data point represents a technical replicate 

of one biological sample. Similar results were obtained with an independent 
ological replicate. (D) Cell surface MHC-| expression on sgMcj YUMM1.7 
cells + SDHA overexpression (n = 4). Data are presented as the fold change 
elative to sgSCR-vector cells. (E to 1) YUMM1.7-sgSCR (n = 5 mice) and 
sgMcj (n = 5 mice) cells were subcutaneously injected in flanks of C57BL/6 
male mice and monitored for tumor formation for 20 days. Shown are tumor 
growth curves (tumor volume versus time is plotted) (E), tumor weight at 
day 20 (in grams) (F), cell surface MHC-! expression relative to sgSCR tumor 
cells at day 20 (G), number of tumor-infiltrating CD45* cells (per gram of 
tumor) at day 20 (H), and CD4* and CD8* T cells (per gram of tumor) 

at day 20 (|). These data are representative of three independent experiments. 
(J) Percentage of IFN-y* and GZMB* positive tumor infiltrating CD8* T cells 

in sgSCR and sgMcj YUMM1.7 tumors. (K) Tumor weights in grams of sgSCR and 
sgMcj YUMML1.7 tumors from C5/7BL/6 mice treated with immunoglobulin 

G (IgG) isotype (algG) and anti-CD8 (aCD8) depleting antibodies for 21 days 


bi 


are key marks that regulate antigen presenta- 
tion in response to succinate accumulation. 
To assess the effect of CII inhibition on global 
epigenetic reprogramming, we performed chro- 
matin immunoprecipitation sequencing (ChIP- 
seq) for H3K4me3 and H3K36me3 in YUMM1.7 
cells treated with either 3-NPA, 3-NPA and aKG 
(to compete with succinate), or dimethyl sul- 
foxide (DMSO) as the vehicle control. As ex- 
pected, H3K4me3 and H3K36me3 signals 
exhibited genome-wide gains after CII inhi- 
bition in YUMM1.7 cells, which were notably 
reversed by oKG supplementation (Fig. 3D 
and tables S7 to S10). Specifically, cells treated 
with 3-NPA showed an increase in 8098 peaks 
for H3K4me3 and 27,334 peaks for H3K36me3 
compared with DMSO (Fig. 3, E and F). How- 
ever, when aKG was added to the 3-NPA-treated 
cells, the global trends were reversed with only 
2612 gained H3K4me3 peaks and 10,741 gained 
H3K36me3 peaks compared with DMSO. (Fig. 3, 
E and F). Notably, several MHC-APP genes 
were significantly enriched for H3K4me3 and 
H3K36me3 (Fig. 3, G and H, and tables S11 and 
S12). For example, H3K4me3 levels were signif- 
icantly increased in the promoter regions of Nirc5, 
Psmb9, Tap!, and Psmb8, and this was markedly 
reversed by aKG treatment (Fig. 31). Similarly, 
the gene body of Tap showed a marked increase 
in H3K36me3, which was significantly rescued by 
aKG treatment (Fig. 3J). On the basis of our tran- 
scription factor and epigenetic results, we pro- 
pose a minimal model for how succinate activates 
MHC-APP transcription. That is, succinate accu- 
mulation downstream of CII inhibition alters the 
epigenetic landscape of the MHC-APP genes by 
suppressing KDM4 and KDM5 histone de- 
methylase activity and increasing NLRC5 levels 
that cooperatively induce transcription of the 
MHC-APP genes. This, in turn, promotes more- 
efficient tumor antigen presentation in YUMM1.7 
melanoma cells. Although it is likely that other 
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transcription factors and epigenetic modifiers 
are involved in the effects of succinate that we 
observe, these results highlight that the MHC-I 
locus is responsive to even subtle epigenetic 
changes that might serve as a sensitive litmus 
test of mitochondrial ETC activity in cells. 


Increasing relative electron flow through 
mitochondrial Cl enhances tumor immunogenicity 
and T cell receptor (TCR) repertoire diversity 


Previous studies have demonstrated the pro- 
inflammatory effects of succinate in macro- 
phages and T cells (28-30). Our work also 
shows that accumulation of succinate within 
tumor cells is proinflammatory by increasing 
antigen presentation that activates T cell- 
mediated killing. However, systemic inhibition 
of CII is likely not a viable approach to increase 
tumor succinate levels because it could initiate 
de novo tumorigenesis, is neurotoxic, and very 
likely would have other adverse physiological 
effects in normal cells and tissues because of 
reduced mitochondrial ETC activity and adeno- 
sine 5'-triphosphate (ATP) production (28, 37). 
However, conditions that rewire the ETC in 
favor of ClI-driven electron flow over that from 
CH might reduce CII activity enough to allow 
succinate accumulation without significant 
reductions in overall ETC activity and ATP 
production. The formation of Cl-containing 
supercomplexes (e.g., the respirasome) has been 
proposed to generate two different pools of 
ubiquinone that enhance CI to CIII electron 
flow and concomitantly decrease the contribu- 
tion from CII to CIII (32, 33). Methylation- 
controlled J protein (MCJ) is an endogenous 
Cl-interacting protein in the inner mitochon- 
drial membrane, knockout of which leads to 
increased CI activity over CII and the formation 
of supercomplexes (34). Thus, we hypothesized 
that rewiring the ETC by knockout of MCJ 
would reduce CII activity without reducing 
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every second day. (L) Uniform manifold approximation and projection 
(UMAP) of 8274 CD8* T cells from YUMM1.7-sgSCR and sgMcj tumors 
showing the formation of two clusters with the respective labels. Each dot 
corresponds to a single cell, color-coded by the sample type (gray, sgSCR; 
orange, sgMcj). (M) UMAP from Seurat of CD8* T cells into four distinct 
clusters according to differentiation and functional marker expression. 
Each dot represents a single cell, color-coded by the cluster type. (N) UMAPs 
showing average expression of functional signatures in CD8* T cell clusters 
identified in (M). The differentiation and functional markers defining the 
cluster are shown at the top. (0) UMAP of CD8* T cells overlaid with TCR 
clonal abundance. Each dot represents a single cell, color-coded by the 
number of TCR clones present. Data points in each panel represent an 
independent sample unless otherwise specified. Data are plotted as 

means + SDs for (A) to (D) and means + SEMs for (E) to (K). Statistical 
significance was determined by unpaired Welch t test for (A) and (B), 
two-way ANOVA with Dunnett's multiple comparisons test for (C), one-way 
ANOVA with Dunnett's multiple comparisons test for (D), unpaired 
Mann-Whitney test for (E) to (H) and (J), two-way ANOVA with Sidak’s 
multiple comparisons test for (I), and two-way ANOVA with Tukey's multiple 
comparisons test for (K). 


overall ETC and ATP production and would 
provide increased succinate for MHC-APP 
expression and enhanced antitumor immu- 
nity. Notably, knockout of MCJ (Mcj-KO) in 
YUMML.7 cells increased ClI-containing super- 
complexes and CI + CIII activity with an asso- 
ciated decrease in CII + CIII activity, resulting 
in increased levels of intracellular succinate, 
antigen presentation, and expression of MHC- 
APP genes (Fig. 4, A to C, and fig. S10, B and C). 
Similarly, YUMMER and B16-F10 Mcj-KO cells 
also had increased succinate and cell surface 
MHC-I and expression of MHC-APP genes 
(fig. S10, D to F). Notably, ectopic expression 
of SDHA or knockdown of Ogdh in Mcj-KO 
YUMML7 cells reduced MHC-I expression, dem- 
onstrating the importance of succinate in 
driving MHC-I expression (Fig. 4D and fig. S11, 
A to C). Furthermore, ChIP-quantitative poly- 
merase chain reaction (qPCR) analysis of pro- 
moters of representative MHC-APP genes in 
Mog-KO cells confirmed the increased H3K4me3 


enrichment (fig. S11D), similar to direct CII - 


inhibition. Next, we subcutaneously injected 
Mc-KO YUMM1.77 cells into syngeneic C57BL/6 
mice to test the effect of rewiring the ETC away 
from CII on tumor growth. Compared with 
control (sgSCR) YUMM1.7 tumors, Mcj-KO 
tumors grew substantially slower, maintained 
high levels of MHC-I, and contained greater 
numbers of CD45* immune cells, especially 
CDs" T cells (Fig. 4, E to I, and fig. S12A), thus 
mirroring the effects that we observed in CII 
knockout YUMM1.7 cells in vivo. Similar re- 
sults were observed in the more immunogenic 
YUMMER mouse melanoma tumor model (35) 
(fig. S12, B to D). Further inspection of tumor- 
infiltrating CD8* T cells showed that the cells 
from sgSCR control tumors expressed activation 
markers CD44 and CD69 as well as the TCF-1 
transcription factor (which is important for 
memory T cells and their precursors, encoded 
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by the 7¢f7 gene), whereas CD8* T cells in Mcj- 
KO tumors expressed greater amounts of PD-1, 
TIM3, CXCR6, TOX, and effector molecules 
such as IFN-y and GZMB (Fig. 4J and fig. S12E). 
Furthermore, antibody-mediated depletion of 
CDs8* T cells restored the growth of Mcj-KO 
tumors, confirming the role of CD8* T cells in 
tumor control (Fig. 4K). These results show 
that the CD8* tumor-infiltrating lymphocytes 
(TILs) in the Mcj-KO tumors receive more anti- 
genic signaling compared with control (sgSCR) 
tumors, which induces their differentiation 
into effector and exhausted cell states. 

Next, we performed single-cell transcriptom- 
ics coupled with single-cell TCR sequencing 
to profile the mRNA and TCRof repertoire of 
CDs* T cells from YUMM1.7 Mg-KO and sgSCR 
tumors. Notably, cells from Mcj-KO and sgSCR 
tumors clustered distinctively (Fig. 4L and fig. 
S13A). Although the majority of the CD8* T cells 
in sgSCR tumors clustered together and dis- 
played signatures of memory CD8*" T cells (Tc/7, 
KI, 1d3, Sox4, Sell, Cxcr3, Gema, Gemk, Cd69, 
and Il7r), the CD8* T cells from Mcj-KO tumors 
clustered into PD1* CXCR6* CD8* T cells (Pded1, 
Cacr6, Havcr2, Id2, Gzmb, Cd38, and Fasl), 
PD1* XCLI1* CD8* T cells (Xcll, Penk, IFNy, Tigit, 
Nr4a2, Bhlhe40, Csf2, Cel, and Prff), and prolif- 
erating PD1* CD8* T cells (Bire5, Mki67, Top2a, 
Cenpf1, Tubb4b, H2afx, Pdcdl), confirming obser- 
vations from fluorescence-activated cell sorting 
(FACS) analysis (Fig. 4, M and N). Additionally, 
although most CD8* T cells from sgSCR controls 
had unique, single TCR clonotypes (clone size of 
1) with high clonotype diversity as calculated by 
the Shannon, Chao, and ACE indexes (fig. S13, B 
and C), CD8* T cells from Mcj-KO consisted of 
hyperexpanded clones (>100 T cells with iden- 
tical TCRa and f-chain) that expressed Pdcd1, 
Cxcr6, Gzmb, and IFNy genes (within the top 10 
abundant clones) (Fig. 40 and fig. $13, D and E). 
Notably, although only a few clones were present 
in both sgSCR and Mcj-KO tumors, most of the 
expanded clones in Mcj-KO tumors were unique 
(fig. SI3F), which points to increased expression 
of a broader set of tumor antigens as a result of 
increased MHC-I. Collectively, these results show 
that increasing intracellular tumor succinate and 
MHC-I (through Mg-KO or direct CI inhibition) 
potentiates tumor cell immunogenicity and 
the activation and infiltration of more tumor- 
reactive effector CD8* T cells that suppress tu- 
mor growth. Our results indicate that discrete 
rewiring of the ETC to moderately reduce CII 
activity or increase succinate in tumors not only 
improves T cell engagement by increasing MHC-I 
but also enhances the selective expansion of pro- 
tective T cell clones, which suggests that tumor 
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MHC-I levels determine tolerogenic versus im- 
munogenic set points for tumor antigens. Thus, 
this ETC-rewiring approach might represent a 
one-two punch to convert cold tumors to hot and 
to improve antitumor responses and immuno- 
therapy efficacy. 
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Reproductive outcomes after pregnancy-induced 
displacement of preexisting microchimeric cells 
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Pregnancy confers partner-specific protection against complications in future pregnancy that 

parallel persistence of fetal microchimeric cells (FMcs) in mothers after parturition. We show that 
preexisting FMcs become displaced by new FMcs during pregnancy and that FMc tonic stimulation is 
essential for expansion of protective fetal-specific forkhead box P3 (FOXP3)-positive regulatory 

T cells (Teg cells). Maternal microchimeric cells and accumulation of T;eg cells with noninherited 
maternal antigen (NIMA) specificity are similarly overturned in daughters after pregnancy, highlighting a 
fixed microchimeric cell niche. Whereas NIMA-specific tolerance is functionally erased by pregnancy, 
partner-specific resiliency against pregnancy complications persists in mothers despite paternity 
changes in intervening pregnancy. Persistent fetal tolerance reflects FOXP3 expression plasticity, which 
allows mothers to more durably remember their babies, whereas daughters forget their mothers with 


new pregnancy-imprinted immunological memories. 


mmune tolerance expands during preg- 

nancy to avert premature rejection of fetal 

cells and tissues, which parallel expansion 

of suppressive CD4 cells identified by the 

forkhead box P3 (FOXP3) transcriptional 
regulator, called regulatory T cells (Tye¢ cells) 
(1). Maternal FOXP3* cells progressively ex- 
pand throughout pregnancy, whereas blunted 
expansion is recognized in a variety of preg- 
nancy complications associated with fetal in- 
tolerance, including preeclampsia, preterm 
birth, and stillbirth (2-4). The necessity for 
FOXP3* cells is further demonstrated in pre- 
clinical studies showing that fetal antigen stim- 
ulation primes maternal T,.¢ cell expansion, 
and fetal wastage can be triggered by FOXP3* 
cell depletion (5-8). Maternal T,.g cells with 
fetal specificity remain expanded in mothers 
after parturition, and their accelerated reaccu- 
mulation in subsequent pregnancy represents 
an instructive framework for investigating how 
prior pregnancy confers partner-specific resil- 
iency against complications in future preg- 
nancy (9-12). However, memory features for 
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fetal-specific T, eg cells also raise questions as 
to whether low-level antigenic stimulation is 
required, similar to effector CD4 cells with 
microbial specificity (13-15), and the postpar- 
tum fetal antigen source, if such reminders are 
indeed necessary. 

Bidirectional transfer of cells between mother 
and offspring occurs ubiquitously during preg- 
nancy, and these genetically foreign cells es- 
tablish microchimerism in both individuals 
after parturition (16-19). Maternal microchimeric 
cell (MMc) persistence in human and rodent 
offspring sustains expansion of FOXP3* Tyeg 
cells with noninherited maternal antigen (NIMA) 
specificity (20, 27) and enforces fetal tolerance 
during next-generation pregnancies sired by 
males with shared overlapping NIMAs (22). 
Given these parallels in reproductive tolerance 
associated with expanded Tyg cells, we consid- 
ered whether fetal microchimeric cells (FMcs) 
can similarly sustain fetal-specific memory 
Treg cells. However, this reasoning raises ad- 
ditional questions on the interplay between 
preexisting MMcs and discordant FMcs seeded 
during pregnancy. Furthermore, how preexist- 
ing microchimeric cells respond to each wave 
of different fetal cells encountered in succes- 
sive pregnancies, and the impacts on tolerance 
to familial antigens expressed by microchimeric 
cells in each context, remains undefined. These 
knowledge gaps in mammalian reproduction 
that pertain to how prior pregnancy affects 
the outcome of future pregnancies in moth- 
ers and their daughters were addressed by 
strategically using transgenic mice that ex- 
press defined model antigens for mating, 
transforming model antigens into surrogate 
fetal antigens or NIMAs, and investigating 
how pregnancy sired by genetically discor- 
dant third-party males affects tolerance to 
familial antigens. 
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Plasticity in NIMA-specific tolerance 
Although MMcs are essential for maintaining 
NIMA-specific T, eg cell expansion and enforc- 
ing fetal tolerance in daughters during first 
pregnancies sired by males expressing shared 
NIMAs (22), the diversity of alloantigens be- 
tween individuals in outbred populations makes 
overlap between paternal-(fetal)-expressed _anti- 
gens and NIMAs uncommon. In this regard, 
how pregnancy by males without overlapping 
NIMAs affects NIMA-specific tolerance and 
MMcs remains uncertain. These outstanding 
questions were evaluated by comparing levels 
of OVA:2W1S* MMcs in OVA:2WIS-negative 
mice born to OVA:2W1S*’~ mothers (fig. SI) 
(22) after allogeneic pregnancy sired by third- 
party nontransgenic CBA H-2* males (Fig. 1A). 
We found sharply reduced levels of OVA:2W1S* 
MMcs, with undetectable levels in most tissues 
after parturition compared with OVA:2W1S* 
MMcs retained in the tissues of age-matched 
virgin controls (Fig. 1A and fig. $2). Loss of 
OVA:2W1S* MMcs in postpartum mice par- 
alleled accrual of H-2* FMcs (Fig. 1A and fig. 
$2), which suggests natural displacement of 
preexisting MMcs by FMcs in daughters after ‘ 
pregnancy. 

Expanded FOXP3* Tye, cells with NIMA 
specificity and resiliency against complica- 
tions during pregnancies that were sired by ‘ 
males who express shared NIMAs are hallmark 
features of NIMA-specific tolerance (20-22). 
To investigate consequences of pregnancy- 
induced loss of MMcs, we evaluated CD4 cells 
with NIMA specificity and outcomes of sec- 
ondary pregnancy by males with overlapping 
NIMAs. Endogenous cells with I-A?:2W1S.5 6g 
NIMA specificity were identified after major 
histocompatibility complex (MHC) class II tet- 
ramer staining and enrichment that exploits - 
their high precursor frequency in C57BL/6 mice ‘ 
(fig. S3) (22, 23). These experiments showed 
reduced percentage of FOXP3* T,.g cells but sim- 
ilar numbers of 2W1S-NIMA-specific CD4 cells. - 
in secondary lymphoid organs (spleen and pe- 
ripheral lymph nodes) after pregnancy-induced 
MMc displacement (Fig. 1B). Pregnancy outcomes 
were evaluated after Listeria monocytogenes 
(Lm) infection that disrupts fetal tolerance 
from dampened maternal T,.¢ cell suppressive 
potency, which causes fetal wastage through 
decidual accumulation of activated fetal-specific 
effector T cells (24, 25). These showed that 
NIMA-enforced resiliency against prenatal 
infection (22) was overturned with MMc loss and 
NIMA-specific Tyg cell contraction. Prenatal Lm 
infection-induced fetal wastage, congenital in- 
vasion, and loss of live pups each increased dur- 
ing second pregnancy sired by BALB/c (H-2°) 
males in H-2°-NIMA females (fig. S4) previ- 
ously mated with CBA H-2* males compared 
with H-2°-NIMA females during first preg- 
nancy by BALB/c (H-2°) males (Fig. 1C). These 
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Fig. 1. MMcs and NIMA-specific tolerance displaced after pregnancy. 

(A) Levels of ovalbumin (OVA) DNA specific to OVA:2W1S* MMcs or H-2* DNA 
specific to H-2* FMcs in each tissue among OVA:2W1S NIMA mice 21 days 
postpartum (PP) after allogeneic pregnancy sired by H-2‘* CBA males (red) or 
age-matched 12- to 14-week-old virgin OVA:2W1S-NIMA mice (blue). GEq, 
genome equivalents. (B) Representative plots and composite data showing 
FOXP3* among CD4 cells with I-A°:2W1S surrogate NIMA specificity and numbers 
of |-A®:2W1S* CD4 cells in pooled secondary lymphoid organs for the mice 
described in (A). (C) Percent fetal wastage, average recoverable colony-forming 


units (CFUs) from concepti in each litter, and numbers of live pups per litter 

5 days after maternal Lm infection midgestation [embryonic day 11.5 (E11.5)] for 
naive H-2° C57BL/6 mice during primary pregnancy sired by H-2" BALB/c 
males (black), compared with genetically identical H-2¢ NIMA mice undergoing 
primary pregnancy sired by H-2° BALB/c males (blue) or H-2" NIMA mice 
during secondary pregnancy by H-24 BALB/c males with prior pregnancy by 
H-2* CBA males (red). Each point indicates the data from an individual mouse 
and is representative of at least three independent experiments, each with similar 
results. Bar, mean + standard error. 


consequences of overriding NIMA-specific tol- 
erance with loss of reproductive benefits in 
daughters after pregnancy by males without 
shared NIMA specificity are identical to those 
of MMc depletion by using antibodies (22) 
and, together, illustrate physiological plasticity 
in NIMA-specific tolerance, with MMcs suscep- 
tible to displacement by pregnancy-induced 
FMcs. 
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Fetal-specific tolerance durability 

Fetal-specific Teg cells persist in mothers after 
parturition in parallel with FMcs (16-19) and 
share with NIMA-specific Tyeg cells the require- 
ment for microchimeric antigen-presenting cells 
(APCs) because pregnancy with I-A‘ OVA:2WI1S* 
surgically transferred embryos does not se- 
lectively prime expansion of Tyeg cells with 


highlighting fetal APC-controlled Tyeg cell dif- 
ferentiation along with maternal APC-primed 
CD4 cell expansion (26). Fetal-specific Tyeg cells 
reaccumulate with increased tempo upon fetal 
antigen restimulation, which is associated with 
enhanced resiliency against Lm prenatal infec- 
tion and other perturbations in fetal tolerance 
that require MHC haplotype shared paternity 


fetal-I-A°:2W1Ss5 6s specificity (fig. $5) (22), 
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in first and second pregnancies (7, 12). These 
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Fig. 2. Pregnancy-induced FMc displacement does not override fetal tolerance. 


(A) Levels of OVA DNA specific to OVA:2W1S* FMcs fro 


H-2* DNA specific to H-2"* FMcs from secondary pregnancy 
C57BL/6 female mice 9 to 10 weeks after a single pregnancy sired by OVA:2W1S* 
H-2" BALB/c males (blue) compared with mice 3 weeks after second pregnancy 
sired by H-2** CBA males (red). (B) Representative plots and 
showing percent FOXP3* among CD4 cells with I-A°:2W1S surrogate fetal specificity 
and numbers of |-A®:2W1S* CD4 cells in pooled secondary lymphoid organs for 


preclinical findings go along with reduced in- 
cidence of preeclampsia and other pregnancy 
complications in women with prior healthy 
pregnancy, and the partner specificity of these 
protective benefits (9-12). Given the plasticity 
of NIMA-specific tolerance, with MMc displace- 
ment and loss of resiliency against pregnancy 
complications after pregnancy sired by males 
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m primary pregnancy or 
in each tissue among 


composite data 


without shared NIMAs, analogous experiments 
evaluated how a change in paternity affects 
preexisting FMcs and fetal-specific tolerance 
primed by primary pregnancy (Fig. 2A). Simi- 
lar to MMc displacement, OVA:2W1S* FMcs 
seeded after pregnancy by OVA:2W1S* males 
were reduced to background levels after sec- 


ondary pregnancy by nontransgenic CBA H-2* 
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the mice described in (A). (C) Percent fetal wastage, average recoverable CFUs from 
concepti in each litter, and number of live pups per litter 5 days after maternal 

Lm infection at midgestation (E11.5) for naive mice during primary pregnancy sired 
by H-24 BALB/c males (black), compared with mice during secondary pregnancy 
sired by H-2° BALB/c males (blue) or during tertiary pregnancy sired by H-24 BALB/c 
males with intervening secondary pregnancy sired by H-2“ CBA males (red). Each 
point indicates the data from an individual mouse and is representative of at least three 
independent experiments, each with similar results. Bar, mean + standard error. 


males, with reciprocal accumulation of new 
fetal H-2** FMcs (Fig. 2A). In turn, FOXP3* 
Treg cells among CD4 cells with fetal T-AP:-2WI1S 
specificity declined after second pregnancy by 
CBA males compared with controls with only a 
single pregnancy by OVA:2W1S* males (Fig. 2B). 
Thus, sustained expansion of FOXP3”* cells 
with fetal and NIMA specificity shares similar 
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Fig. 3. Resiliency against fetal wastage despite FMc depletion. (A) Levels of 
OVA DNA specific to OVA:2W1S* FMcs in each tissue of mice 30 days PP from primary 
pregnancy sired by OVA:2W1S* H-2° BALB/c males treated with a-OVA IgG on 

days 2 and 16 after parturition (red) or rabbit IgG (blue), or virgin controls (black). 
(B) Percent FOXP3* among CD4 cells with I-A®:2W1S surrogate fetal specificity and 
numbers of |-A°:2W1S* CD4 cells in pooled secondary lymphoid organs for the mice 
described in (A). (C) Percent fetal wastage, average recoverable CFUs from concepti 
in each litter, and number of live pups per litter 5 days after maternal Lm infection 


midgestation (E11.5) for mice during primary pregnancy s 


requirements for tonic stimulation by FMcs 
and MMcs, respectively. 

Consequences associated with FMc loss and 
fetal-specific FOXP3* Tye, cell contraction were 
evaluated by comparing outcomes of tertiary 
pregnancy sired by males genetically identical 
to those used for primary pregnancy. We rea- 
soned that if secondary pregnancy by third- 
party males overrides tolerance to antigens 
primed by primary pregnancy, then protection 
against Lm prenatal infection would be sim- 
ilarly overturned. Unexpectedly, levels of Lm 
infection-induced fetal wastage, congenital 
invasion, and live pups during tertiary preg- 
nancy sired by H-2¢ OVA:2WI1S* males in mice 
with intervening secondary pregnancy by 
CBA H-2* males were nearly identical to mice 
undergoing secondary pregnancy by H-24¢ 
OVA:2WI1S* males and sharply reduced com- 
pared with control mice undergoing primary 
allogeneic pregnancy (Fig. 2C). Thus, despite 
shared requirements for FMcs and MMcs main- 
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° 


N 


ired by OVA:2W1S* H-24 


taining expanded FOXP3* Tyeg cells with fetal 
and NIMA specificity, respectively, loss of FMcs 
does not functionally erase fetal tolerance 
analogous to how loss of MMcs eliminates NIMA- 
specific tolerance. In other words, mothers 
develop lasting functional memory of their off- 
spring, despite FMc displacement during inter- 
vening pregnancy, compared with less durable 
memory by daughters for their mothers. 
These differences in necessity between FMcs 
and MMcs maintaining tolerance to fetal- 
expressed antigens and NIMAs, respectively, 
prompted additional investigation adapting 
anti-OVA immunoglobulin G (IgG) previous- 
ly used for depleting MMcs (22) to deplete 
OVA:2WI1S* FMcs. We reasoned that FMc de- 
pletion by using this more decisive approach 
would allow their necessity for maintaining 
fetal tolerance to be more definitively eval- 
uated. These experiments showed near-complete 
loss of OVA:2W1S* FMcs after anti-OVA IgG 
administration in female mice after pregnancy 
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BALB/c males (black), compared with during secondary pregnancy sired by 
OVA:2W1S* H-2¢ BALB/c males in mi 
(red) PP on days 2 and 16 after prim 
CD4 cells with |-A®-2W1S surrogate fetal specificity, number of |-A°:2W1S* FOXP3* 
CD4 cells in pooled secondary lymph 
cells midgestation during primary pregnancy (black) or during secondary pregnancy 
in mice treated with rabbit IgG (blue 
data from an individual mouse and 
experiments each with similar resu 


ce treated with rabbit IgG (blue) or anti-OVA IgG 
ary pregnancy. (D) Percent FOXP3* among 


oid organs, and fold-expansion of these 
or o-OVA IgG (red). Each point indicates the 


is representative of at least three independent 
ts. Bar, mean + standard error. 


sired by OVA:2WI1S* males (Fig. 3A), with re- 
duced FOXP3* T,.g cells but similar numbers 
of I-A:2W1S fetal-specific CD4 cells (Fig. 3B). 
Thus, sustained expansion of fetal-specific 
FOXP3* T,eg cells in mothers after parturition 
requires tonic FMc stimulation, shown by anti- 
body depletion or natural displacement with 
new paternity in subsequent pregnancy (Figs. 
2B and 3B). 

To further delineate consequences of FMc 
depletion with ensuing FOXP3* Tye, cell 
contraction, prenatal Lm susceptibility during 
second pregnancy was evaluated in mice de- 
pleted of FMcs after primary pregnancy. These 
experiments showed sustained protection 
against Lm infection during second preg- 
nancy sired by the same males, regardless of 
interpregnancy depletion of H-2¢ OVA:2WI1S* 
FMcs, compared with sharply increased Lm 
susceptibility during primary pregnancy (Fig. 
3C). Resiliency against fetal wastage in mice 
depleted of H-2° OVA:2W1S* FMcs was still 
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Fig. 4. FOXP3 plasticity sustains resiliency against pregnancy complications. 
(A) Percent FOXP3(GFP)-negative among dTOMATO* CD4 cells with |-A°:2W1S 
fetal specificity or tetramer-negative bulk cells in pooled secondary lymphoid organs 
for FOXP3(GFP)°"= R26dTOMATO mice without prior pregnancy (virgin, black), 

or mice 30 days after primary pregnancy sired by OVA:2W1S* BALB/c males 
treated with rabbit IgG (blue) or a-OVA IgG (red) on days 2 and 16 after parturition. 
(B) Percent FOXP3(GFP)-negative cells among dTOMATO* CD4 cells with |-A°:-2W1S 
surrogate fetal specificity or tetramer-negative bulk cells in pooled secondary 
lymphoid organs of FOXP3(GFP)°"= R26dTOMATO mice after primary pregnancy 
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sired by OVA:2W1S* BALB/c males (blue) or additional secondary pregnancy sired 
by third-party H-2* CBA males (red). (C) Expression of each molecule by FOXP3 
(GFP)* compared with FOXP3(GFP)-negative cells among dTOMATO* CD4 cells 
with |-A®-2W1S surrogate fetal specificity in PP mice after primary pregnancy 

by OVA:2W1S* BALB/c males (blue) or in PP mice with OVA:2W1S* FMc depletion 
by a-OVA IgG (red circles) or FMc displacement with secondary pregnancy by 

H-2* CBA males (red squares). gMFI, geometric mean fluorescent intensity. (D) Percent 
FOXP3* among PP donor CD90.2* CD45.1* FOXP3°'® (solid) or PP donor CD90.2* 
CD45.2* FOXP3"" (open) CD4 cells with -A?:2WIS fetal specificity in pooled secondary 
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lymphoid organs of CD90.1* (CD90.2-negative) recipients midgestation (E11.5) 
during allogenic pregnancy sired by OVA:2W1S* BALB/c males administered 
DT for 2 consecutive days after adoptive cell transfer and before mating. 
Splenocytes from PP FOXP3°'® and FOXP3™™ donors were harvested 30 days 
after primary pregnancy by OVA:2W1S* BALB/c males, with some mice 
administered a-OVA or rabbit IgG (on days 2 and 16 after parturition). 

(E) Percent fetal wastage and average recoverable CFUs from concepti in each 


litter, 5 days after maternal Lm infection midgestation 


associated with robust fetal-2W1S FOXP3* 
Treg cell reexpansion, similar to control mice 
without FMc depletion (Fig. 3D). Given di- 
minished numbers of fetal-2W1S-specific 
FOXP3* Tyee cells with OVA:2W1S* FMc deple- 
tion after primary pregnancy, the ability of 
these cells to numerically catch up during sec- 
ondary pregnancy reflects accelerated reex- 
pansion with fetal-2W1S restimulation (Fig. 
3D), which cannot be explained by rebounding 
OVA:2WI1S* FMcs because their numbers in 
mice treated with anti-OVA IgG were not in- 
creased and were even reduced in some tissue 
(iver) compared with those in control mice un- 
dergoing second pregnancy (fig. S6). Together, 
these results indicate that whereas MMcs and 
FMcs are each essential for sustained expansion 
of FOXP3* Tyg cells with familial antigen spec- 
ificity, fetal tolerance is further reinforced in 
mothers by CD4 cells poised for FOXP3" re- 
expansion with fetal antigen restimulation in 
subsequent pregnancy. 


Fetal-specific CD4 cell FOXP3-expression plasticity 


FOXP3 expression plasticity is increasingly as- 
sociated with a variety of autoimmune disorders, 
with cells that turn off FOXP3 expression called 
“eX-Tyeg Cells” (27-30). To investigate whether 
FOXP3* Ty eg cell contraction after FMc deple- 
tion reflects ex-T,., cell differentiation, CD4 
cells were evaluated after pregnancy sired by 
OVA:2WIS* males and depletion of OVA:2W1S* 
FMcs in FOXP3-lineage fate-tracking FOXP3 
(GFP)® R26dTOMATO mice (GFP, green fluo- 
rescent protein), in which FOXP3* cells and their 
progeny are permanently marked by dTOMATO 
expression, which allows discrimination be- 
tween dTOMATO* FOXP3(GFP)* Tyeg cells and 
dTOMATO* FOXP3(GFP)-negative ex-Tyeg 
cells. These experiments showed that ~40% of 
dTOMATO* cells with fetal-2W1S specificity in 
postpartum mice were FOXP3(GFP)-negative, 
which indicates repressed FOXP3 expression 
in previous FOXP3* cells (Fig. 4, A and B). En- 
hanced ex-Tyg cell differentiation was observed 
only among CD4 cells with fetal-2W1S speci- 
ficity because most (~80%) dTOMATO*% cells 
of the same specificity in virgin control mice 
and tetramer-negative cells with bulk spec- 
ificity in postpartum mice retained FOXP3 
(GFP) positivity. 

Besides loss of FOXP3, ex-T,.g¢ cells com- 
pared with FOXP3* cells in postpartum mice 


have reduced expression of several activation 
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(E11.5) for mice during 


markers—including CD25, GITR, CD103, and 
OX40—which is consistent with ex-Tyeg cells 
identified in other contexts (27, 29, 30), and 
near-identical levels between cells with fetal- 
2WIS specificity and tetramer-negative cells 
(fig. S7). Expression of other molecules asso- 
ciated with T,.¢ cell suppression, including 
interleukin-10 (IL-10) and CD39, was similarly 
reduced, with IL-10 reductions accentuated 
among fetal-2WIS cells (fig. S7). In turn, tumor 
necrosis factor-a (TNFa) and interferon-y 
(IFNy) production by fetal-specific ex-T,eg cells 
was not increased compared with FOXP3*" cells, 
whereas these proinflammatory cytokines were 
increased among tetramer-negative ex-T,.g cells 
compared with FOXP3* cells or fetal-2W1S ex- 
Treg cells (fig. S7). Expression of PD1 increased 
similarly among ex-T; eg cell compared with 
FOXP3* cells regardless of specificity, whereas 
expression of other molecules associated with 
Treg cell suppression (CTLA4 and CD73) re- 
mained elevated among fetal-2WIS ex-Tyeg cell 
compared with FOXP3 cells, highlighting re- 
tained functional distinctions despite repressed 
FOXP3 expression by cells with fetal-2WI1S spe- 
cificity. The proportion of fetal-2W1S-specific 
€X-Tyeg Cells further increased after OVA:2W1S* 
FMc depletion with use of anti-OVA antibodies 
or after pregnancy-induced displacement 
(Fig. 4, A and B), with sustained reductions in 
CD25, GITR, CD103, and OX40 expression and 
increased CD127 expression (Fig. 4C). Thus, a 
substantial proportion of pregnancy-primed 
fetal-specific T,.¢ cells lose FOXP3 expres- 
sion and become phenotypically distinct ex- 
Treg cells after parturition, with even more 
exaggerated ex-T, cg cell differentiation after 
FMc depletion. 

Treg cells that reexpand with fetal restimu- 
lation can originate from memory FOXP3* cells 
or ex-Tyeg cells. These possibilities were eval- 
uated by comparing how depletion of postpar- 
tum FOXP3* cells affects fetal-specific T,,.g cell 
reexpansion. Purified CD4 cells from congeni- 
cally discordant FOXP3”"® (31) and FOXP3"" 
mice 25 to 30 days after primary pregnancy 
sired by OVA:2WI1S* males were co-transferred 
into virgin recipients (DTR, diphtheria toxin 
receptor; WT, wild type), and diphtheria toxin 
(DT) was administered within the first 48 hours 
to selectively eliminate preexisting FOXP3" cells 
among the FOXP3”7® donor pool. We reasoned 
that if reexpanded T,.¢ cells primarily orig- 
inate from memory FOXP3* cells, then deplet- 
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primary pregnancy sired by OVA:2W1S* H-2% BALB/c males adoptively 
transferred donor splenocytes 1 day prior (E10.5) from isogenic virgin mice 
(black), PP FOXP3"" (blue), PP FOXP3°"® (red filled) (depletion of donor 

FOXP3* cells), or PP FOXP3(GFP)°** R26iDTR (open red) (depletion of FOXP3* cells 
and ex-Treg cells) and administered DT for two consecutive days (E10.5 and 
E11.5). Each point indicates the data from an individual mouse and is 
representative of at least three independent experiments, each with similar 
results. Bar, mean + standard error. 


ing these cells before secondary pregnancy 
would blunt their expansion with fetal-2W1S 
restimulation. In agreement, FOXP3* cells were 
reduced among fetal-2W1S CD4 FOXP3°7® 
compared with FOXP3”" donor cells (Fig. 4D). 
To further investigate whether ex-Ty.. cells can 
bypass this necessity for memory FOXP3* 
cells, this approach was modified by depleting 
OVA:2W1S* FMcs in FOXP3°™® and FOXP3" 
donor mice before transfer. We reasoned that | 
if FOXP3* cells also reexpand from ex-Tyeg 
cells, enhanced ex-Tyeg cell differentiation with 
FMc depletion would rescue FOXP3* cell ex- 
pansion among FOXP3°"® donor cells. In agree- 
ment, FOXP3* cells rebounded among FOXP3?7® 
donor mice who were given anti-OVA IgG before 
transfer, and to levels similar to those of donor 
FOXP3™" cells that were depleted of OVA:2W1S* 
FMcs before transfer (Fig. 4D), which highlights 
the shared participation of memory FOXP3* 
cells and ex-T,.g cells in maternal Tye cell re- 
expansion with fetal antigen restimulation. 

To more definitively compare the contri- 
bution of postpartum FOXP3* Tyg cells and 
ex-Tyeg Cells in enforcing fetal tolerance, spleno- 
cytes from postpartum donors were adoptively 
transferred to recipient mice midgestation 
during primary pregnancy and infected with 
Lm the following day (Fig. 4E). We reasoned 
that if systemic immune cells retained after 
parturition mediate resiliency against prenatal 
Lm infection, then unfractionated splenocytes 
from postpartum donors would transfer pro- 
tection to recipient mice. In agreement, pre- 


natal Lm infection susceptibility was sharply - 


reduced in mice reconstituted with unfraction- 
ated donor splenocytes from mice after preg- 
nancy sired by the same BALB/c H-2¢ males, 
compared with near-complete fetal wastage in 
mice reconstituted with unfractionated spleno- 
cytes from virgin donors (Fig. 4E), which is 
further increased compared with primary al- 
logenic pregnancy without donor splenocyte 
transfer (fig. S8). Extending this approach by 
using splenocytes from postpartum FOXP3°"* 
mice to selectively deplete donor FOXP3* cells 
or FOXP3(GFP)°®® R26iDTR mice for con- 
current elimination of donor ex-Tyeg cells plus 
FOXP3* Tyg cells, with DT administration to 
recipient mice, showed that protection against 
prenatal Lm infection is retained despite near- 
complete (>95%) depletion of donor FOXP3* 
Treg Cells (Fig. 4E and fig. S9). Comparatively, 
protection is lost when postpartum donor 
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splenocytes are simultaneously depleted of 
ex-Tyeg cells plus FOXP3* Treg cells, even with 
combined depletion of these cells occurring 
with slightly reduced efficiency (~89%) (Fig. 
4E and fig. S9). Thus, pregnancy-primed mater- 
nal CD4 cells with plasticity in FOXP3 ex- 
pression are essential for enhanced resiliency 
against complications in future pregnancy. 


Discussion 


Although ex-T,.¢ cells are increasingly recognized 
in autoimmunity (27-29, 32), our findings also 
highlight reproductive benefits for FOXP3 ex- 
pression plasticity. Given the importance of 
reproductive fitness in trait selection, our 
finding that ~40% of FOXP3* cells primed by 
fetal antigen stimulation during pregnancy 
lose FOXP3 expression after parturition—with 
ex-Tyeg cell differentiation further enhanced by 
FMc depletion—and the shared necessity for 
ex-Tyeg cells plus FOXP3* cells retained after 
parturition in averting complications in future 
pregnancy extend the applicability of FOXP3 
plasticity to this more physiologically impera- 
tive context. Ex-T,.¢ cells are broadly subdivided 
into two non-mutually exclusive subsets: cells 
that lose suppressive function and adopt pro- 
inflammatory phenotypes, and committed sup- 
pressor cells that remain latently poised for 
FOXP3 reexpression upon T cell receptor stim- 
ulation (28, 30). Rapid reaccumulation of 
FOXP3* cells with fetal restimulation indi- 
cates that postpartum ex-T;.¢ cells retain latent 
potential for FOXP3 reexpression in the tolero- 
genic pregnancy context. However, pregnancy- 
induced sensitization to fetal alloantigens that 
can negatively affect the outcomes of subse- 
quent offspring-to-mother or father-to-mother 
allograft transplantation (33-35) suggests that 
pregnancy-induced ex-T;.¢ cells may also adopt 
proinflammatory phenotypes under less tolero- 
genic conditions. Because reproduction is the 
primary means of alloantigen exposure, preg- 
nancy-induced changes provide an instructive 
framework for further investigating lineage 
plasticity versus heterogeneity for FOXP3* cells 
and context-specific environmental or ex-Tyeg 
cell-intrinsic features that control durable sup- 
pressive function versus reprogramming into 
proinflammatory phenotypes. 

From the perspective of first siblings, 
imprinting tolerance in mothers that selec- 
tively promotes survival of younger siblings 
with similar genetic traits makes sense teleo- 
logically. FMcs represent antigenic sentinels 
that maintain expansion of Tyg cells with spe- 
cificity to fetal-expressed antigens from prior 
pregnancy, analogous to how MMcs maintain in 
offspring expanded NIMA-specific FOXP3* 
Treg cells (22), which together highlight the 
need for antigenic reminders to numerically 
sustain T,.¢ cells similar to other memory CD4: 
cell subsets (13-15). However, considering the 
unexpectedly fixed physiological niche that al- 
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lows females to contain only one dominant set 
of microchimeric cells at a time, differentia- 
tion into latent ex-T;.g cells represents an ad- 
ditional fail-safe mechanism in which mothers 
more durably remember fetal antigens that 
were encountered in all prior pregnancies. 
Physiological displacement of preexisting MMcs 
by FMcs, or FMcs retained from first preg- 
nancy by FMcs in each subsequent pregnancy, 
explains the wide heterogeneity in recovery of 
microchimeric cells without parity and pater- 
nity considerations (36). Pioneering studies 
that control for these parameters in women 
and show reduced MMc and nonincreasing 
FMc levels with increasing parity suggest that 
the fixed microchimeric cell niche we demon- 
strate in mice is conserved in humans (37). 
Because we exclusively evaluated pregnancy 
outcomes and microchimerism in mice, im- 
portant next steps will be to more definitively 
evaluate in women and other mammalian spe- 
cies how parity affects MMc levels and the 
persistence of NIMA-specific tolerance, along 
with FMcs associated with first pregnancy in 
primiparous compared with multiparous women. 
Additional outstanding questions remain re- 
garding why the repertoire of exceptionally 
rare microchimeric cells is not expandable, es- 
pecially considering their numerical and cellu- 
lar interindividual variability during embryonic 
development (38, 39) and protective role against 
early life infection (40). Given increased circu- 
lating MMc levels in women during pregnancy 
(41) and recent identification of grandmater- 
nal microchimeric cells in first-pregnancy spe- 
cimens from cord blood (42), purposeful MMc 
displacement to further extend tolerance to 
grandchildren represents a provocative expla- 
nation that requires further evaluation. Collec- 
tively, these results regarding how pregnancy 
affects the outcomes of future pregnancies in 
mothers and their daughters reveal distinctions 
in how familial antigens are immunologically 
remembered and how tolerance to fetal alloan- 
tigens is naturally optimized to improve preg- 
nancy outcomes. 
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Structural basis for inactivation of PRC2 


by G-quadruplex RNA 
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Polycomb repressive complex 2 (PRC2) silences genes through trimethylation of histone H3K27. 
PRC2 associates with numerous precursor messenger RNAs (pre-mRNAs) and long noncoding RNAs 
(IncRNAs) with a binding preference for G-quadruplex RNA. In this work, we present a 3.3-A-resolution 
cryo-electron microscopy structure of PRC2 bound to a G-quadruplex RNA. Notably, RNA mediates the 
dimerization of PRC2 by binding both protomers and inducing a protein interface composed of two 
copies of the catalytic subunit EZH2, thereby blocking nucleosome DNA interaction and histone H3 tail 
accessibility. Furthermore, an RNA-binding loop of EZH2 facilitates the handoff between RNA and 

DNA, another activity implicated in PRC2 regulation by RNA. We identified a gain-of-function mutation 
in this loop that activates PRC2 in zebrafish. Our results reveal mechanisms for RNA-mediated regulation 


of a chromatin-modifying enzyme. 


any nuclear proteins that bind chro- 

matin also bind RNA molecules (J-3). 

The binding of RNA has been sug- 

gested to facilitate both positive and 

negative regulation (e.g., recruitment 
to target sites and enzymatic inhibition, re- 
spectively). Polycomb repressive complex 2 
(PRC2) is a prominent example of a chroma- 
tin modifier known to be regulated by RNA 
(4, 5). PRC2 is essential for embryonic develop- 
ment and cell differentiation (6, 7). Some tu- 
mors are PRC2 dependent (e.g., because of 
silencing of tumor suppressor genes), making 
PRC2 a target for cancer therapeutics (8). PRC2 
consists of four core protein components: EZH2 
(catalytic subunit), EED [binds histone H3 tri- 
methylated at lysine 27 (H3K27me3)], SUZ12 
(provides a platform), and RBAP48 (7). Asso- 
ciating cofactors define two PRC2 subclasses 
(9, 10), of which PRC2.2, containing AEBP2 and 
JARID2, is the subject of this study. 

PRC2 binds numerous pre-mRNA and long 
noncoding RNA (ncRNA) transcripts in vitro 
and in vivo (J1-13). This broad RNA recogni- 
tion can be explained at least in part by PRC2 
preferring an RNA G-quadruplex (G4) motif 
(14-16), which could be ubiquitous in the tran- 
scriptome from intramolecular and perhaps 
even intermolecular assemblies (17). Proposed 
models of RNA regulation of PRC2 remain 
disparate. First, in the “handoff’ model, PRC2 
requires RNA for recruitment and occupancy 
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on a specific subset of targeted chromatin 
(8, 19). The direct handoff from RNA to DNA 
is an intrinsic property of PRC2, as shown by 
recent biophysical analyses (20). Second, the 
“eviction” model suggests that nascent RNA 
removes PRC2 from actively transcribed chro- 
matin to restrict nonspecific activity (14, 27-23). 
Third, in the “inhibitor” model, RNA and nu- 
cleosome binding of PRC2 are mutually exclu- 
sive, so RNA serves as a direct competitor to 
prevent PRC2 action (J4, 18, 22, 24). Another 
version of the inhibitor model proposes that 
RNA exploits a regulatory site on PRC2 to abol- 
ish H3K27me3 binding to EED, which conse- 
quently eliminates allosteric activation of EZH2 
(25). Therefore, structural details of PRC2-RNA 
interaction have been needed in the field to 
provide mechanistic insights and coordinate 
those models. 

Cryo-electron microscopy (cryo-EM) and x-ray 
crystallography have provided visualization of 
both substrate-free and nucleosome-bound 
PRC2 complexes (26-34). However, solving a 
structure of a PRC2-RNA complex has proved 
challenging. A streptavidin-biotin-affinity EM 
grid approach has been successfully used in 
cryo-EM (35), and here we adapt this technique 
for ribonucleoprotein (RNP) complexes using 
biotinylated RNA. We found that PRC2 can 
dimerize following RNA binding with a protein- 
protein interface composed of EZH2 CXC 
domains. The structure provides a molecular 
explanation for how RNA acts as a PRC2 in- 
hibitor, and it suggests a mechanism for RNA 
facilitation of PRC2 recruitment. 


Structure of G-quadruplex RNA-mediated 
PRC2 dimer 


We prepared six-subunit PRC2.2 complexes with 
full-length EZH2, SUZ12, RBAP48, EED, em- 
bryonic short-isoform AEBP2, and truncated 
JARID2}19-450 (Fig. 1A), as well as two G4-forming 
RNAs (Fig. 1B) that bind PRC2 in vitro (fig. S1). 
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RNA-bound PRC2 complexes and also protect 
them from the hydrophobic water-air inter- 
face. Because this method had not been ap- 
plied to an RNP complex, we validated it by 
testing different RNA concentrations with the 
same excess of PRC2. The number of particles 
observed by negative staining was proportional 
to the RNA concentration in most fields (Fig. 
10), indicating that the vast majority of protein 
complexes on the grid are bound to RNA. Com- 
pared with RNA-free PRC2 (particles ~150 A in 
diameter), the majority of two-dimensional 
(2D) class averages from 1G4- and 2G4-bound 
PRC2 complexes were larger (particles ~250 A 
in diameter), containing two recognizable PRC2 
complexes (Fig. 1D). 

We determined the cryo-EM structure of the 
dimeric PRC2-1G4 RNA complex at 3.4-A res- 
olution from consensus refinement, and at 
3.3 A from multibody refinement (36) (figs. S2 
and S3 and table S1). The two PRC2 protomers 
in the RNP complex are nearly identical and 
have a conformation previously characterized 
as the SANT1 extended form (37) (Fig. IE and 
fig. S4). We identified a protein interface in the 
RNA-induced dimer that is a localized EZH2- 
EZH2 interaction (described in the dimer in- 
terface section below) (movie S1). 

Notably, this PRC2 dimer has imperfect C 
symmetry (fig. S5A) associated with differ- 
ential occupancy of RNA in the two symmet- 
ric sites (fig. S5B). The stronger density has a 
volume representative of a G4 RNA (fig. S5, C 
and D, and S6D), whereas the symmetric site 
has discontinuous density and is not discussed 
hereafter. We could not obtain high-quality 
RNA density for de novo modeling in either of 
the sites from multibody refinement (fig. $5, C 
and D) and particle subtraction classification 
(fig. S6) (38). We attribute this to the multiple 
independent and flexible interactions between 
PRC2 and RNA. The G4 RNA is not nestled 


into the surface of the protein, as is typically - 


seen for RNA-protein complexes, but instead 
appears to be separated from the protein. The 
RNA density is in closest proximity to the EZH2 
SANT2 domain (residues 353 to 362), EZH2 
CXC domain (residue R532), EZH2 SET domain 
(residue N697), the RRM (RNA recognition 
motif)-like domain of SUZ12, and an unstruc- 
tured region of RBAP48 (residues 92 to 107) 
(movie S1). These binding sites, excluding 
RBAP48, are supported by previous in vitro 
and in vivo studies (25, 26, 39, 40). We were able 
to observe density that links regions proximal to 
PRC2 and G4 RNA at 3.9-A resolution from 
particle subtraction and classification (de- 
scribed in the arginine-rich site section below). 

The distance between the two PRC2 proto- 
mers is reduced from 51 to 48 A on the side with 
stronger G4 density (Fig. 1E, bottom), which 
not only explains the imperfect symmetry, but 
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Fig. 1. Overall structure of a PRC2-1G4 RNP complex. (A) (Left) Schematic of — alone collected from continuous carbon grids and PRC2-G4 RNAs from 
the proteins in the PRC2.2 six-subunit complex. Transparent N-terminal region streptavidin-affinity grids. (E) (Top) Cryo-EM density map of PRC2 bound to 1G4 


of JARID2 was not included. (Right) Coomassie-stained gel of purified PRC2. RNA. EZH2 (CXC-SET) of protomer 1 in blue, EZH2 (CXC-SET) of protomer 2 
(B) The two RNA oligonucleotides used in this study. (€) Negative-staining EM in light blue, and 1G4 RNA in orange. (Bottom) The atomic model of PRC2 bound 
images of streptavidin-affinity grids with excess PRC2 and various 144 RNA to 1G4 RNA. Distances between EZH2 (SANT2) and SUZ12 (RRM-like) are 


concentrations. (D) Negative-staining EM provided 2D-class averages of PRC2 highlighted by black arrows to emphasize the change from 1G4-binding. 
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Fig. 2. G4 RNA induces PRC2 dimerization in solution. (Left) Size-exclusion chromatography of PRC2 
preincubated with 1G4, 2G4, and mock (protein only). Abs, absorbance; mAU, milli-absorbance unit. 
(Right) Standard curves were used to estimate the molecular weights of PRC2 complexes. 


also supports the model of a single G4 being 
sufficient for PRC2 dimerization. Although the 
stoichiometry of most RNPs is 2 PRC2:1 RNA 
in our preparations, we do not reject the pos- 
sibility of a PRC2 dimer engaging two inde- 
pendent G4 RNAs simultaneously. 


G-quadruplex RNA induces PRC2 dimerization 
in solution 


To validate the dependence of PRC2 dimeriza- 
tion on G4 RNA binding in solution, we used 
analytical size-exclusion chromatography and 
mass photometry. In the absence of RNA, our 
six-subunit PRC2 complex chromatographed 
as a monomer (Fig. 2), which was consistent 
with an absolute molecular weight of 340 kDa 
(fig. S7A). Incubating PRC2 with 18-kDa 1G4 
RNA or 30-kDa 2G4 RNA led to a large RNP 
complex of approximately 720 kDa measured 
by both size-exclusion chromatography (Fig. 2) 
and mass photometry (fig. S7C), which was 
consistent with a dimer. In addition, native gel 
electrophoresis of PRC2-G4 RNP cross-linked 
with glutaraldehyde indicated that G4 RNA re- 
mains bound as part of a cross-linked complex 
(fig. S7D). Furthermore, negative-staining EM 
reference-free 2D class averages of cross-linked 
complexes on conventional carbon-support grids 
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confirmed the presence of RNA-mediated 
PRC2 dimers as observed in the non-cross- 
linked streptavidin-affinity grids (fig. S7E). 
Together, our results verified the requirement 
of RNA for this specific PRC2 dimerization in 
solution and provided confidence that the dimer 
was not an artifact of streptavidin-affinity 
selection. 

To test the specificity of G4 structure in me- 
diating PRC2 dimerization, we used microscale 
thermophoresis (MST) to measure PRC2-RNA 
binding in different reaction conditions (fig. 
S8). In a G4-favoring KCl buffer, the MST profile 
showed two distinguishable stages of thermo- 
phoretic mobility. This biphasic binding curve 
is typical for two binding events (41), which 
suggests that a higher-affinity binding site on 
PRC2 is primarily occupied at low PRC2 con- 
centrations (1 PRC2:1 RNA), and at higher PRC2 
concentrations, lower-affinity binding of a sec- 
ond PRC2 follows (2 PRC2:1 RNA). In addition, 
we performed the same MST assays in a G4- 
destabilizing LiCl buffer or using a G-rich 
single-stranded RNA with no G4-forming po- 
tential. Neither experiment gave a distinct bi- 
phasic curve, indicating that PRC2 dimerizes 
specifically on RNA containing at least one 
G4 motif. 
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The dimer interface prevents nucleosome 
and H3 tail binding 
The PRC2-1G4 RNA dimer has three features 
that would be expected to inhibit nucleosome 
binding and histone methylation. First, the 
residues within the CXC domain of EZH2 in- 
teract with the CXC of the second protomer 
to form the dimer interface. This dimer inter- 
face includes two R566-A569 hydrogen bonds, 
two R566-T573 hydrogen bonds, two Q575- 
G564 hydrogen bonds, and a K568-K568 hy- 
drophobic interaction (Fig. 3A). By contrast, 
in a nucleosome-bound PRC2, the CXC domain 
facilitates the catalytic activity of the adjacent 
SET domain of EZH2, specifically with R566, 
K568, T573, and Q575 contributing to interac- 
tions with nucleosome DNA and the H3 tail 
(Fig. 3B) (27). The disparate functions of the 
CXC domain in these different PRC2 structures 
are seen by the superposition of our density map | 
onto the nucleosome-bound PRC2, which shows 
clashes with both the DNA and the H3 tail 
(Fig. 3C and movie S2). Therefore, we propose 
that nucleosome binding and H3 tail loading, 
both of which are essential for histone methy- 
transferase (HMTase) activity, are mutually 
antagonistic with RNA-mediated PRC2 di- 
merization. The disruption of nucleosome-PRC2 
complexes by 1G4 RNA was confirmed by a 
competition-binding assay in solution (Fig. 3D). 

Second, the EZH2 bridge helix (residues 500 
to 516)—important for nucleosome DNA bind- 
ing and channeling H3 tail into the active site 
of the EZH2 SET domain (27)—is disordered 
in both protomers of our RNP complex (fig. 
89). This is consistent with structures of PRC2 
lacking nucleosome substrate. Lastly, the EZH2 
C-terminal helix (residues 738 to 742), which 
points away from the H3K27 binding site in 
nucleosome-bound PRC2, now occludes the 
active site in RNA-bound PRC2 (fig. S9) and 
appears to serve as an additional mechanism 
to prevent H3 tail binding. 

To test the importance of the CXC dimer 


interface, we purified a mutant (EZH2 R566A_ - 


K568A Q575A). Mutation of these residues did 
not affect G4 RNA binding (fig. S10, A and B) 
or prevent RNA-induced dimerization (fig. S10, 
C and D). However, negative-staining EM 
with streptavidin-affinity grids classified sub- 
stantially more monomer-size particles from 
the mutant (59%) than the wild-type (WT) PRC2 
(9%) (Fig. 3E and fig. SIOE). This suggests that 
these EZH2 mutations impacted the overall 
stability of the PRC2 dimer by destabilizing 
the protein-protein interaction, and consequent- 
ly, one protomer more easily dissociated dur- 
ing stringent washes (dilutions) in our grid 
preparations compared with the WT. Because 
streptavidin-affinity grids only retain RNA-bound 
complexes, those monomer particles of the mu- 
tant could also represent an intermediate stage 
of one PRC2 engaging RNA prior to associa- 
tion of the second PRC2, which is consistent 
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Fig. 3. RNA-mediated PRC2 dimer is 
an inactive complex. (A) Cryo-EM 
structure of PRC2-1G4 complex with 
zoom-in to show the interface of 

two EZH2 CXC domains. Interacting 
residues (R566, K568, T573, and Q575) 
are highlighted in the stick representation. 
One set of hydrogen bonds (RO566NE- 
75730G1, 270-A distance; R566NH1-A5690, 
273-A distance; and Q575NE2-G5640, Cc 
2.51-A distance) is indicated by yellow 
dashed lines. The other set of hydrogen 
bonds between the same amino acid 

pairs in the second PRC2 protomer is not 
shown for clarity. Another view of the 
same region is shown in fig. S3G. 

(B) Structure of PRC2-nucleosome 

complex (PDB: 6WKR) with zoom-in to 
emphasize the CXC interactions with 
nucleosome H3 tail (orange). The same 
residues as shown in (A) are high- 

lighted in stick representation. 

(C) Superposition of the EZH2 CXC 

domain of the RNA-bound PRC2 on 

the nucleosome-bound PRC2 to empha- 

size disparate functions of the CXC 

domain. (D) Nucleosome-RNA competi- D 
tion assay. PRC2 was incubated with 

constant amount of radiolabeled trinu- 
cleosome and serial dilutions of 1G4 RNA. PRC2-Trinuel oe 
Incomplete PRC2-trinucleosome 4 
complexes are indicated by * and ** We Trinucleosome ee 
assume two of three nucleosomes were 621 bp DNA 

occupied by PRC2 in * and one of three WT —_R566A K568A Q575A 
nucleosomes in ** (E) Negative-stain EM 
to quantify monomer and dimer particles 
of EZH2 R566A K568A Q575A binding 
1G4. The number of particles in each 
class is indicated above each bar. 

(F) Binding affinity of 1G4 RNA or 
dsDNA to WT PRC2 and EZH2 R566Y 
K568Y W575Y (3Y) measured by FP. 
We used a reaction buffer with a lower eens! | bas oy aio eee ee 
Kg, dissociation constant. (G) Methyl- 

transferase activities from figs. S12 to G [@ Protein only @PRC2and1G4_APRC2 and 26a] H WT PRC2 
S15 are plotted against reaction times 
for PRC2 (400 nM) preincubated 
with 1G4 (400 nM), 2G4 (400 nM), 
and mock (protein only). Error bars 
represent standard deviations of 
three replicates performed on different 
days. (H) Methylation of trinucleo- 
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Fig. 4. EZH2 loops physically contact G4 RNA and contribute to direct 
handoff from RNA to DNA. (A) Map of PRC2-1G4 RNA from particle subtraction 
and classification (fig. S6) with zoom-ins to emphasize the observed physical 
interactions of PRC2 and G4 RNA. (B) EZH2 structure from AlphaFold predicts 
two disordered loops of EZH2. Arginine-rich loop [EZH2(353-362)] and lysine-rich 
loop [EZH2(494-502)] are indicated in blue and green, respectively. (C) FP assays to 
monitor the transfer kinetics of PRC2 from fluorescently labeled 1G4 RNA to a 


with the biphasic binding curve observed in 
our MST assays. 

We next attempted to disrupt the CXC inter- 
face more severely by substituting bulky side- 
chains of tyrosine, so we constructed an EZH2 
R566Y K568Y Q575Y mutant. Unexpectedly, 
this mutant had higher binding affinity to the 
G4 RNAs, as determined by fluorescence po- 
larization (FP) assays (Fig. 3F, left) and elec- 
trophoretic mobility-shift assays (EMSA) (fig. 
S11, A and B). Notably, double-stranded DNA 
(dsDNA) binding of this mutant was not af- 
fected (Fig. 3F, middle and right), which fur- 
ther indicates that DNA and RNA use separate 
mechanisms to engage PRC2 even though 
they bind mutually antagonistically. Although 
this mutant PRC2 is a monomer as observed 
by negative-staining EM, RNA-bound particles 
showed a dominant population of dimer com- 
plexes, which is consistent with the increased 
RNA binding affinity and the role of RNA in 
mediating PRC2 dimerization (fig. S11, C and 
D). We propose that the aromatic sidechains 
of tyrosine might stack on each other and 
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therefore stabilize the CXC interface. Thus, 
the dimerization interface need not be spe- 
cific, and it appears to be RNA binding rather 
than protein-protein interaction that drives PRC2 
dimerization. Overall, we observed a positive 
correlation between the CXC dimer interface 
and G4 RNA binding. 


RNA-induced PRC2 dimer is inactive 


Structural observations on the EZH2 CXC in- 
terface prompted us to hypothesize that the 
HMtTase activity of the G4-induced PRC2 di- 
mer would be inhibited. To test this, we per- 
formed activity assays to compare free PRC2 
with RNA-sequestered dimers (Fig. 3G and 
figs. S12 to S15). As expected, we detected sub- 
stantial reductions in methylation rates with 
all substrates (including recombinant H3) in 
response to RNA binding, with stronger inhibi- 
tion by the higher-affinity 2G4 RNA (Fig. 3G). 
The extent of inhibition was limited by the 
RNA concentration because complete inhibi- 
tion was achieved with excess 1G4 RNA (Fig. 
3H and fig. SI6A). 
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dsDNA competitor. The ratio ke/k-; of the PRC2 RNA-to-dsDNA direct-transfer 
rate constant (kg) and the PRC2-RNA dissociation rate constant (kK) provides a 
measure of the propensity of PRC2 to exchange these ligands by the direct transfer 
mechanism. WT PRC2 had kp = 90 + 11 M?st, k, = 5.6 + 0.49 x 10457, and 
ko/k_y = 17 + 0.35 x 10° M+. EZH2 A353-362 had ky = 48 Ms k, = 12 x 10* st 
and ky/k. = 0.4 x 10° M+. EZH2 CR had ky = 130 + 62 Mts7 k, = 27 +063 x 10* st 
and ko/k. = 4.6 £13 x 10° M7. ko#°S, dissociation rate constant observed. 


An arginine-rich site of EZH2 binds G4 RNA 
and participates in RNA-to-DNA handoff 
Applying particle subtraction and classifica- 
tion (fig. S6), we identified multiple sites in 
EZH2 [EZH2(353-362): KRPGGRRRGR, EZH2 
R532, and EZH2 N697] that physically contact 
G4 RNA (Fig. 4A). The arginine-rich EZH2(353- 
362) has been implicated in binding IncRNA 
(25, 39), and similar arginine-rich sequences 
in multiple transcription factors have been 
linked to RNA binding (42). EZH2 truncation 
[EZH2(A353-362)] and a local charge-reversed 
EZH2 [EZH2 CR(353-362): DEPGGEEEGE] both 
exhibited decreased HMTase activity but showed 
no obvious reduction in G4 RNA binding or 
G4 RNA-mediated PRC2 dimerization in vitro 
(fig. S17). We also generated a double-truncation 
mutant [EZH2(A353-362 A494-502)] to remove 
an adjacent lysine-rich site (EZH2 494-502) 
(Fig. 4B). EZH2 residues 494 to 502 were 
previously implicated in binding nucleosome 
DNA and G4 RNA (27, 40) and might com- 
pensate for RNA binding in the absence of 
EZH2 residues 353 to 362. PRC2 containing 
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Fig. 5. EZH2 CR is a gain-of-function mutant in rescue of zebrafish develop- 
ment. (A) Representative images of injected zebrafish embryos at 48 hours 

post fertilization (hpf). Gross phenotypic scoring of anterior-posterior axis growth 
was sorted into three categories: normal, reduced, and severely reduced growth. 
(B) Scoring of anterior-posterior axis growth at 48 hpf. Zebrafish embryos were 
injected with 4-ng ezh2-MO or the same amount of control MO. For rescue 


EZH2(A353-362 A494:-502) exhibited a 1.5- to 
twofold reduction of binding affinity for G4 
RNA (fig. S17G). We attribute this modest re- 
duction to the presence of other RNA-binding 
regions observed in our structure and sup- 
ported by previous studies (25, 40, 43). 

PRC2 has the intrinsic ability to directly trans- 
fer or hand off from RNA to DNA, without there 
ever being a free-enzyme intermediate (20, 44). 
We therefore examined whether the arginine- 
rich EZH2(353-362) region, in addition to its 
role in RNA binding, is important for such 
RNA-to-DNA handoff. We found that the 
EZH2(A353-362) and EZH2 CR mutants had 
a 4.3-fold reduction and a 2.7-fold increase, 
respectively, in the propensity for direct trans- 
fer from RNA to DNA (Fig. 4C). We rationalized 
these results with a model (fig. $18) in which 
PRC2 harbors the arginine-rich EZH2(353-362) 
and lysine-rich EHZ2(494-502) to form a ternary 
intermediate with both RNA and DNA binding. 
EZH2(A353-362) and CR mutations of the RNA- 
binding region affect the propensity for direct 
transfer in different directions. 


EZH2 CR is a gain-of-function mutant 
in zebrafish development 


We used zebrafish to examine the significance 
of the EZH2 G4 RNA-binding sites in verte- 
brate development. Zebrafish and human EZH2 
proteins have high sequence identity, including 
the regions responsible for G4 RNA binding 
and RNA-induced PRC2 dimerization (fig. S19A). 
EZH2 knockdown in zebrafish by antisense 
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morpholino oligonucleotides (MOs) led to a 
severe growth defect (45) represented by gross 
alterations in the length of the anterior-posterior 
axis (Fig. 5A). As expected, coinjection with 
mRNA that encodes human WT EZH2 signif- 
icantly rescued the growth defects (P < 0.0001) 
(Fig. 5B and fig. S19B). Mutant EZH2 mRNAs 
rescued overall development to varying degrees 
(fig. S19C). Notably, the EZH2 CR mutant had a 
gain-of-function phenotype, giving significant- 
ly better rescue than that of WT EZH2 (P < 
0.01) and phenotypically mimicking a gain-of- 
function mutant (EZH2 Y646F) that is well- 
studied in the human system and frequently 
found in lymphoma (Fig. 5B) (46-48). Dose- 
response assays by coinjecting ezh2-MO and in- 
creasing amounts of mRNAs (Fig. 5C) confirmed 
that the extent of rescue was consistently similar 
between CR and Y646F, whereas the catalyt- 
ically dead EZH2(A694-751) gave no rescue. 
Therefore, the EZH2 CR mutant, which shows 
an increased propensity for direct transfer be- 
tween RNA and DNA in vitro, also behaves as a 
gain-of-function mutant in zebrafish development. 


Discussion 


In the past 10 years, IncRNAs and pre-mRNAs 
have become prominent in discussions of PRC2 
regulation (4, 5). Because of the broad PRC2 
transcriptome (11, 12), deciphering molecular 
details of PRC2-RNA interaction has been chal- 
lenging. PRC2 binds G4 RNA in vitro (14, 15, 23), 
PRC2 binding sites on chromatin genome 
wide are associated with G-tract motifs (15), 
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experiments, 100 ng of mRNA encoding WT or mutated human EZH2 was 
coinjected with ezh2-MO. At least three clutches were examined for each injection. 
Fisher's exact test was used to determine the P values. (C) Dose-response 
experiments with 25, 50, and 100 ng of mRNA coinjected. Statistical analyses were ¢ 
performed as in (B) by comparing different doses with ezh2-MO alone. ns, not 
significant; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. 


and a well-defined G4-forming RNA TERRA ‘ 


(telomeric repeat-containing RNA) recruits 
PRC2 to telomeres (J6). Thus, we focused on G4: 
RNA in the present study. 

By using a biotinylated G4: RNA with streptavidin- 
biotin affinity grids, we determined the cryo-EM 
structure of an RNA-bound PRC2 complex. This 
structure supports the earlier conclusion that 
nucleosomal DNA and RNA binding are mutu- 
ally antagonistic (22, 24), but it provides a much 
more interesting mechanism than just competi- 
tion on overlapping sites. Instead, G4 RNA trig- 
gers formation of a PRC2 dimer that occludes 
the DNA-binding amino acids. Based on the pre- 
sent structure and biochemical and biophysical - 
data, we propose a model that can explain multi- 
faceted functions of RNA in PRC2 regulation. 

First, actively transcribed loci, which need 
to avoid silencing by PRC2, generate nascent 
RNA transcripts that have the potential to di- 
merize PRC2 among other RNA processing 
events. The RNA-induced dimer simultaneously 
inactivates two PRC2 complexes (no H3 tail 
binding) and evicts them from local chromatin 
(no nucleosome DNA binding). Residues of 
the EZH2 CXC domain, known to load the H3 
tail into the catalytic groove, are occupied in a 
protein-protein interaction that reinforces the 
dimerization. Dimerization may prevent the 
spreading of H3K27me3 across regions near 
preexisting H3K27me3 or JARID2 116me3 
marks, both of which induce allosteric activa- 
tion of PRC2 (37, 49, 50), and therefore, it may 
help define heterochromatin boundaries. 
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Second, the interactions of PRC2 with RNA 
are proposed to be important for PRC2 occu- 
pancy on chromatin (9, 57). Consistent with this 
idea, PRC2 has the intrinsic ability to translocate 
onto target chromatin as it simultaneously dis- 
sociates from inhibitory RNA (20). To reconcile 
the role of RNA in inhibiting and evicting PRC2 
from chromatin and its role in promoting PRC2 
chromatin occupancy, we propose that EZH2 
harbors partially redundant nucleic acid bind- 
ing sites that allow PRC2 to transiently engage 
both RNA and DNA, thereby facilitating direct 
transfer from RNA to DNA. The EZH2 CR mu- 
tant, designed to destabilize RNA binding, en- 
hances the direct transfer of PRC2 from RNA 
to DNA in vitro. This CR mutant rescues the 
knockdown of PRC2 in zebrafish better than 
WT EZH2. This gain-of-function phenotype is 
consistent with a direct transfer from RNA to 
DNA, which facilitates PRC2 activity in vivo. 
However, the many differences between in vitro 
and in vivo experiments warrant a cautious in- 
terpretation. The precise mechanism of the 
EZH2 CR mutant gain-of-function merits fur- 
ther investigation. 

Our structure describes one mode of RNA 
recognition by PRC2, but there may very well 
be others. Other reported RNA-binding sites 
include the RNA-binding region (RBR) adja- 
cent to the bridge helix of EZH2 (residues 494 
to 502) (40), the stimulatory recognition motif 
(SRM) of EZH2 (residues 127 to 153) (25), the 
EED amino acids close to EZH2 SRM (residues 
336 to 355) (25), and the JARID2 RBR (residues 
332 to 358) (43). Although we did not obtain 
any subclass map having a distinguishable RNA 
density in proximity to those regions, this is not 
sufficient to reject their RNA-binding potential. 
We propose that the numerous RNA-binding 
regions within PRC2 explain why mutations 
give only modest effects on RNA binding in 
this and other studies. 

PRC2 has been shown to dimerize without 
RNA binding. We consistently find that, in the 
absence of RNA, a small fraction of PRC2 mol- 
ecules self-associate into dimers at high protein 
concentrations, as found for a four-subunit 
PRC2 holoenzyme (52) and the six-subunit 
PRC2.2 complex (fig. S20A). However, 2D class 
averages of self-associated dimers show a dif- 
ferent dimer interface than the RNA-induced 
dimer (fig. S20B). In addition, two reported 
domain-swapped PRC2 dimers—PRC2-PCL 
and PRC2:EZH1 (29, 32)—have completely dif- 
ferent architectures than our RNA-mediated 
dimer. 

Aside from PRC2, many chromatin-associated 
complexes have been found to interact with 
RNA, including other histone modifiers (2), 
transcription factors (42, 53, 54), and DNA 
methyltransferase (55, 56). Similar to PRC2, they 
tend not to have canonical RNA-recognition 
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motifs and bind RNA broadly, and obtaining 
molecular structures of the RNA-protein com- 
plexes has been very challenging. Ultimately, 
solving additional structures of RNA bound to 
epigenetic modifiers will reveal the mechanisms 
by which RNA serves direct regulatory roles, 
rather than simply serving as a messenger. 
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Axon regeneration can be induced across anatomically complete spinal cord injury (SCI), but robust 
functional restoration has been elusive. Whether restoring neurological functions requires directed 
regeneration of axons from specific neuronal subpopulations to their natural target regions remains 
unclear. To address this question, we applied projection-specific and comparative single-nucleus RNA 
sequencing to identify neuronal subpopulations that restore walking after incomplete SCI. We show 
that chemoattracting and guiding the transected axons of these neurons to their natural target region 
led to substantial recovery of walking after complete SCI in mice, whereas regeneration of axons simply 
across the lesion had no effect. Thus, reestablishing the natural projections of characterized neurons forms 
an essential part of axon regeneration strategies aimed at restoring lost neurological functions. 


he transected axons of injured central 
nervous system neurons can now be in- 
duced to regenerate through and across 
anatomically complete spinal cord injury 
(SCI) with multipronged treatments that 
reactivate latent growth programs and provide 
chemoattractive growth factors (7-4). Analo- 
gously, immature neural progenitors grafted into 
complete SCI lesions can attract host axons into 
lesions and can extend their axons out of lesions 
to grow extensively throughout the central ner- 
vous system (5, 6). However, despite the exten- 
sive axon regeneration achieved by these and 
related approaches (7, 8), reproducible restora- 
tion of functions has been elusive, suggesting that 
essential yet unidentified mechanisms to restore 
neurological functions have yet to be identified. 
Whether robust restoration of function will 
require targeting specific neurons and regen- 
erating the axons of these neurons not simply 
across lesions but also guiding them to reach 
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their natural target region is unknown. To ad- 
dress this question, we studied neuronal sub- 
populations in the spinal cord that can restore 
walking by relaying supraspinal commands 
past severe but incomplete SCI (9-11). We 
hypothesized that regenerating the transected 
axons of neuronal subpopulations that are es- 
sential for recovery after incomplete SCI to 
reach simply across anatomically complete le- 
sions will fail to improve functional recovery, 
whereas chemoattracting and guiding the axons 
of these neurons to reach their distal natural 
target region in the lumbar spinal cord would 
mediate substantial recovery of walking. 


Characterizing neurons involved in 
natural recovery 


The lumbar spinal cord hosts the neuronal 
subpopulations that produce walking. Unilat- 
eral hemisections (Brown-Séquard syndrome) 
deprive these neurons of essential supraspinal 
inputs to produce walking on the injured side. 
Yet, both humans and animal models recover 
bilateral walking after these injuries (72, 73). 
Our and other previous studies showed that, 
in this scenario, neurons located in the mid- 
thoracic spinal cord relay supraspinal com- 
mands past the lateral hemisection to restore 
walking (9-11). Even after temporally and spa- 
tially separated, opposite-side hemisection le- 
sions that interrupt all direct projections from 
the brain to the lumbar spinal cord, these neu- 
rons can still relay sufficient supraspinal input to 
restore bilateral walking (9-11). Ablation of these 
neurons in the thoracic spinal cord did not alter 
walking in the absence of injury but eliminated 
the natural recovery of walking observed after 
unilateral or bilateral hemisections (9, 14). There- 
fore, we aimed to dissect the molecular and 
anatomical properties of the neuronal subpop- 
ulations underlying this natural recovery. 
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with projections to the lumbar spinal cord,.-2 
injected recombinant adeno-associated virus 2 
(rAAV2) encoding enhanced green fluorescent 
protein (eGFP) fused to the nuclear envelope 
protein KASH into the lumbar spinal cord of 
uninjured mice (fig. S1A). This strategy labeled 
the nuclei of neurons with direct projections 
to the lumbar spinal cord throughout the cen- 
tral nervous system, including putative relay 
neurons in the midthoracic spinal cord (fig. 
S1A), and enabled fluorescence-activated nuclei 
sorting coupled to single-nucleus RNA sequenc- 
ing (snRNA-seq) of projection-specific neuronal 
subpopulations (Fig. 1A and fig. S1, B and C). 
We profiled the midthoracic spinal cord with 
snRNA-seq and obtained high-quality tran- 
scriptional profiles from 122 eGFP°N and 2823 
eGFP°* nuclei (fig. S1, D to I). Unsupervised 
clustering identified all of the major cell types 
of the spinal cord (fig. S1, J to M). We then 
subjected the neurons to a second round of 
clustering, which identified 28 subpopulations 
that expressed canonical marker genes (Fig. 1B 
and fig. S2A). Our taxonomy parcellated neurons 
into cardinal classes, including motor-sensory, 
local-long range, and excitatory-inhibitory sub- ‘ 
populations (fig. $2, B to D) (15). The 105 eGFPON 
neurons were primarily found within a single 
ventral neuronal subpopulation of thoracic neu- 
rons (Hoxa7) that expressed the marker Vsx2 ‘ 
plus a marker of long-distance projection neu- 
rons, Zfhx3 Z-group neurons (15), which we 
named spinal cord (SCY Hoa athe stab 
neurons (Fig. 1C and fig. $2, E and F). 
Because these neurons express Vsx2 (16, 17), 
they derive from developmentally defined V2a 
neurons. Vsx2-expressing neurons are found in 
different locations along the neuraxis, includ- 
ing the brainstem (/8), cervical spinal cord 
(19-21), and lumbar spinal cord (22-25), where - 
they exhibit a variety of projection patterns ‘ 
(26-30). Accordingly, the distinct properties of 
different subpopulations of Vsx2-expressing neu- 
rons dictate and restrict their specific contribu-  - 
tion to neurological functions, such as reaching 
(19, 21, 31) and walking (18, 22, 23, 25, 31-34). 
Indeed, although developmentally defined V2a 
neurons located in the lumbar spinal cord have 
been implicated in the production of walk- 
ing (18, 22, 23, 31-34), the ablation of all neu- 
rons in the thoracic spinal cord, including 
those expressing Vsx2, has no detectable im- 
pact on walking in uninjured rodents (9, 14). 
Given that thoracic neurons only become 
essential to walking after incomplete SCI (9), 
we asked whether SCYS?*Ho#7:2itx3—lumbar yey. 
rons are transcriptionally perturbed after na- 
tural recovery. We compared neuronal nuclei 
from uninjured mice and mice that had recov- 
ered walking (Fig. 1, D and E, and fig. S3, A to E) 
after temporally and spatially separated lat- 
eral hemisection SCIs (fig. S4A and movie S1). 
High-quality transcriptional profiles were obtained 
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Fig. 1. Transcriptional identification A 
of neurons underlying natural 

spinal cord repair. (A) Overview of the 
experimental approach enabling 
projection-specific single-nucleus RNA 
sequencing (snRNA-seq) of neuronal 
subpopulations projecting from the 

thoracic spinal cord to walking 
execution centers. (B) Clustering tree of 
neuronal subpopulations in the thoracic 
spinal cord. (C) Uniform manifold 
approximation and projection (UMAP) 
visualization of neuronal nuclei revealing 
28 neuron subpopulations (left). 
Individual nuclei are colored by the 
proportion of their nearest neighbors 
obtained from sorted projection neurons 
(eGFP density), revealing a primary origin 
from SCVSx2:Hoxa?::Zfhx3—lumbar neurons 
(right). (D) Overview of the experimental 
approach enabling snRNA-seq after 
natural spinal cord repair. (E) Chrono- 
photography of mice before and after 
natural spinal cord repair. (F) Walking 
was quantified using principal component 
analysis applied to gait parameters 
calculated from kinematic recordings. In 
this denoised space, each dot represents 
a mouse (n > 10 gait cycles per mouse, 
n= 6 mice per group, n = 5 mice in 
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paring mice that had undergone natural 
repair versus uninjured mice (bottom). 


from 9264 nuclei (fig. S4, B to K) and were in- 
tegrated with our projection-specific snRNA-seq 
experiment, wherein we identified and evaluated 
the same 28 neuronal subpopulations (Fig. 1G 
and fig. S4, L to O). Cell type prioritization (35, 36) 
revealed that SCVS?"Hoxa7:2ihxs—lumbar n eyrons 
exhibited the most pronounced transcriptional 
response across all neuronal subpopulations em- 
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bedded in the thoracic segments of mice that 
had recovered walking (Fig. 1G and fig. S4, N to 
Q), and Gene Ontology analysis (fig. S4R) re- 
vealed that these transcriptional responses involved 
the up-regulation of dendritic spine morphogen- 
esis pathways, synaptic potentiation programs, 
and actin cytoskeleton reorganization—all con- 
sistent with an involvement in natural recovery. 
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Connectome features of thoracic relay neurons 
Our results thus far implied that 
Scvs2: :Hoxa7::Zfhx3—lumbar neurons contribute 
to the production of walking after natural re- 
covery. Therefore, we hypothesized that these 
neurons must possess connectome features 
compatible with the requirements to walk 
after paralysis. 
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Visualization of the projectome from neu- 
rons embedded in the midthoracic spinal cord 
revealed dense projections throughout the lum- 
bar gray matter (fig. S5A). To identify neuronal 
subpopulations possessing this projectome 
combined with a transcriptional phenotype 
consistent with our prioritized neuronal sub- 
populations (Fig. 1G), we compared the dis- 
tribution and connectome of Vsx2°% neurons 
using intersectional genetics and viral tracing 
in Vsx2“* mice. We found that Vsx2°% neu- 
rons accounted for 5.9% of neurons in the 
midthoracic spinal cord (fig. S5B), which agreed 
with the distribution of neurons identified in 
our snRNA-seq data (fig. S2, D and E). Tracing 
of midthoracic Vsx2°% neurons revealed the 
expected presence of dense projections through- 
out the lumbar spinal cord (fig. S5C). 

We next asked whether midthoracic Vsx2°" 
neurons could be stratified into subpopula- 
tions projecting locally versus over long dis- 
tances. In the spinal cord, neurons with local 
versus long-distance projections can be differ- 
entiated by the expression of Zfhx3 (15) (fig. 
82, C and D). To label long-distance projecting 
Vsx20N neurons, we infused rAAV2-Efla-DIO- 
Flpo into the lumbar spinal cord of Vsx2°° 
mice followed by injections of AAV5-Con/Fon- 
eYFP (enhanced yellow fluorescent protein) 
into the midthoracic spinal cord (Fig. 2A, fig. 
S5D, and movie S1). This strategy enabled the 
exclusive labeling of Vsx2°% neurons that pro- 
jected to the lumbar spinal cord (Fig. 2A, fig. 
S5D, and movie S1). We found that Zfhx3 and 
Vsx2 colocalized only in neurons projecting to 
this region (SCA Hoe As amet) (Fig. 2B) 
(15). Quantification of local (Vsx2°NZfhx3°"") 
versus long-distance projecting (Vsx20NZfhx3°) 
Vsx2% neurons revealed a near-equal distri- 


A Vsx: 


Optical section 


Fig. 2. Projection and connectome features of SCYS*2#Hoxa7:2fhx3—-lumbar 
neurons. (A) Whole spinal cord visualization of projections from Vsx2 neurons in 
the lower thoracic spinal cord that project to walking execution centers. Insets 
illustrate the starter neurons labeled using intersectional viral tracing, and their 
projections in the lumbar spinal cord. (B) Vsx2 neurons with projections to 
walking execution centers express Zfhx3, a key marker of projection neuronal 
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bution of these two subpopulations through- 
out the midthoracic spinal cord (fig. $5, D and 
E). These findings confirmed that a subset of 
Vsx2°% neurons embedded in the midthoracic 
spinal cord coexpress Zfhx3 and project to the 
lumbar spinal cord. 

We reasoned that to function as relays of sup- 
raspinal commands, GV sx2::Hoxaz: :Zfhx3—lumbar 
neurons must also receive direct projections 
from key supraspinal neurons involved in the 
recovery of walking after paralysis. To expose this 
connectome, we infused AAV5-CMV-TurboRFP 
(CMV, cytomegalovirus; RFP, red fluorescent 
protein) into the ventral gigantocellular nu- 
cleus (vGi), because vGi neurons are essential 
for this recovery (37), followed by infusions 
of rAAV2-hSyn-GFP into the lumbar spinal 
cord and labeling of Vsx2 and vGlut2 synaptic 
puncta (Fig. 2C). As anticipated, we found 
that SCVsx2: :Hoxa7::Zfhx3—lumbar neurons located 
in the midthoracic spinal cord receive pro- 
jections from the vGi (Fig. 2C), and that this 
projection pattern is maintained after natu- 
ral recovery (fig. 85, F and G). 

These results indicated that among the di- 
verse populations of neurons in the midthoracic 
spinal cord (5, 31, 35, 38. 400), GCVs2:Hoa?:-Zitx3 lumbar 
neurons were not only the most transcription- 
ally perturbed neuronal subpopulation during 
natural recovery but also exhibited the relevant 
anatomical profile to relay supraspinal com- 
mands past the incomplete SCI to the lumbar 
spinal cord. 


Regeneration of axons to their natural target 
region after complete SCI 


We previously found that providing factors es- 
sential for axon growth during development 
supported axon regeneration across anatom- 
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ically complete SCIs into viable neural tissue 
located one segment below the injury, but that 
this regrowth did not restore walking (J). On 
the basis of our findings above, we hypothe- 
sized that recovery of walking after complete 
SCI could be achieved by reestablishing the 
natural projection patterns of neuronal subpop- 
ulations that contribute to recovery of walking 
after incomplete SCI. We therefore sought to 
determine whether SCVsx2: :Hoxa7::Zfhx3—lumbar 
neurons could be regenerated to reach their 
natural target region in the lumbar spinal cord. 

To test this possibility, we adapted our pre- 
viously established regeneration strategy that 
harnesses three developmental mechanisms 
(movie S2) (1). First, we reactivated the intrin- 
sic growth capacity of neurons located above 
the SCI with viral overexpression of osteo- 
pontin (Spp7), insulin-like growth factor 1 (gt), 
and ciliary-derived neurotrophic factor (Cntf) . 
(AAV-OIC) (42). Second, we induced the for- 
mation of axon growth-supportive substrates 
within the lesion with temporally delayed de- 
livery of fibroblast growth factor 2 (FGF2) and 
epidermal growth factor (EGF). Third, we 
delivered biomaterial depots of glial-derived 
neurotrophic factor (GDNF) as a chemo- 
attractive agent below the injury (/, 42, 43). 
Analysis of snRNA-seq data confirmed the ex- 
pression of GDNF receptor, Gfrai, and Ret in 
GS CVS%2:Hoxaz: :Zfhx3—lumbar neurons, both of which 
are required for appropriate GDNF signaling, 
and immunohistochemistry of Vsx2°% axons 
traced with AAV5-Con/Fon-eYFP validated ex- 
pression of the GDNF receptor within the soma 
and along the axons of GV sx2::Hoxa’::Zfhx3—lumbar 
neurons (Fig. 2B). 

Consistent with our previous observations 
(1), we found that stimulated, supported, and 


subpopulations. These neurons also express Gdnfr on the neuron soma as well 
as along the axon. (C) Overview of the experimental approach enabling 
anterograde viral tracing of ventral gigantocellular nuclei (vGi) neurons in both 
uninjured mice and mice that underwent natural repair. A 3D view of synapse-like 
contact of Vsx2 neurons in the lower thoracic spinal cord with vGi virally traced 
projections, indicated with the presynaptic marker, vGlut2. 
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chemoattracted axons regrew through astrocyte 
borders, across the fibrotic scar, and into via- 
ble neural tissue below a complete SCI (fig. 
S6A). Nevertheless, regenerating axons termi- 
nated only one segment below the injury where 
the most distal GDNF-containing biomaterial 
depot had been infused. Accordingly, behav- 
ioral assessments conducted at 4 weeks after 
injury failed to detect any recovery (fig. S6B). 
This observation contrasted with the natural 
recovery of walking observed by 4 weeks after 
in complete SCI, involving gCVs2::Hoxa?::Zihx3—lumbar 
neurons whose axons terminated within the 
lumbar spinal cord located several segments 
more distally (Figs. 1F and 2A). 

We therefore reasoned that the recovery of 
walking after complete SCI cannot be achieved 
simply by bridging the lesion gap with short- 
distance or undirected regeneration, but that 
one of the key additional requirements must be 
to propel axons to their natural target region in 
the lumbar spinal cord. To achieve such long- 
distance and directed regeneration, we placed 
an additional depot of chemoattractive GDNF 
into the lumbar spinal cord (fig. S6C). However, 
this additional depot attracted comparatively 
few axons to the targeted lumbar region (figs. 
S6C and S7B), and behavioral assessments again 
failed to detect any recovery (fig. S6D). 

We then posited that the relatively slow time 
course of long-distance axon growth, matura- 
tion, and synapse formation might require a 
more sustained and higher concentration of 
chemoattractive growth factor delivery than 
was provided by the biomaterial depot (44). To 
test this possibility, we engineered a lentivirus 
to provide sustained delivery of growth fac- 
tor (45). Replacing biomaterial depots with 
lentivirus-mediated GDNF expression enabled 
an extensive regrowth of axons to their natural 
target region over two segments distally (Fig. 3, 
A to C; fig. S7; and movie S2), further demon- 
strating that appropriate chemoattraction gra- 
dients can guide directed long-distance axon 
regeneration in a manner similar to that of de- 
velopment (46). 

To determine whether regenerated axons in- 
cluded those originating from SC’S*"He274nstumbar 
neurons, we infused rAAV2-hSyn-KASH-eGFP 
into the lumbar spinal cord. This strategy ex- 
clusively expressed eGFP in neurons whose axons 
had regrown sufficient distances to reach the 
lumbar spinal cord (fig. S8A). Nuclei of eG@FPON 
neurons located above the injury were sorted 
for snRNA-seq, and the resulting transcrip- 
tional profiles were integrated into our atlas 
of the thoracic spinal cord (fig. S8, B to G). 
Comparing the distribution of eGFP°N neu- 
rons to the distribution of neuronal subpopu- 
lations in the uninjured spinal cord showed 
that SCVS?"Hoxa7:Zihx3—lumbar n eyyons were the 
main virally labeled subpopulation, confirming 
the successful regeneration of this neuronal 
subpopulation to its natural target region (47) 
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(Fig. 3E and fig. S8, H and I). Retrograde trac- 
ing coupled with immunohistochemistry of 
Vsx2 confirmed these results (Fig. 3F and fig. 
S8J). Transcriptional profiles showed that, 
compared to all other neuronal subpopulations, 
regenerated SCVS?"Hoxa7::2ihx3—lumbar yeyrons 
up-regulated axon regeneration pathways, Igf 
receptor signaling, and synaptic formation and 
transmission programs, as well as axon exten- 
sion and maturation pathways, consistent with 
their regeneration and stabilization within the 
lumbar spinal cord (fig. S8K). 

To test whether regenerated SCVS?H02" Zins tumbar 
neurons projected to the lumbar spinal cord in 
a manner similar to that found in uninjured 
mice, we injected AAV5-hSyn-flex-tdTomato 
into the thoracic spinal cord of Vsx2“° mice 
that had undergone regeneration after com- 
plete SCI. We combined this tract tracing with 
immunolabeling of Vsx20% and Chat©% neu- 
rons in the lumbar spinal cord. Regenerated 
axons were found around, and made contacts 
with, these neuronal subpopulations that are 
known to contribute to the production of walk- 
ing (18, 22, 23, 31-34) and are essential to re- 
gain walking after paralysis (25) (fig. S7E). 
Uninjured mice exhibited a similar projec- 
tion pattern, suggesting that regenerated 
GCVsx2::Hoxa7::Zfhx3—lumbar neurons may inher- 
ently reform appropriate connections with their 
natural targets (fig. S7E) (47). 

We also asked whether supraspinal commands 
could be detected below the anatomically com- 
pl ete SCI after regenerating gCVS2:Hoa?: :Zfhx3—lumbar 
neurons to reach the lumbar spinal cord. We 
found that microstimulation of the vGi induced 
large motor evoked potentials in leg muscles, 
revealing that supraspinal centers had regained 
functional access to the lumbar spinal cord 


(Fig. 3D). 
These results demonstrate’ that 
Scvsx2: :Hoxa7::Zfhx3—lumbar neurons can be en- 


gineered to regrow functional axons to their 
natural target region in the lumbar spinal cord. 


Substantial recovery of walking after 
anatomically complete SCI 


Because our regenerative strategy chemo- 
attracted and guided a molecularly defined 
neuronal subpopulation involved in natural 
recovery to regrow to their appropriate target 
region in the lumbar spinal cord, we antici- 
pated that this strategy may restore walking 
after complete paralysis. We therefore per- 
formed longitudinal quantification of whole- 
body kinematics during walking in five separate 
cohorts of mice that underwent anatomically 
complete SCI and received the regeneration 
strategy. Evaluations showed that the SCI 
abolished leg movements in every mouse, such 
that even at 4 weeks after SCI, no mice exhib- 
ited any sign of recovery (fig. S9A). In all mice, 
the regenerative strategy promoted the growth 
of projections from GCVsx2:Hoxaz: :Zfhx3— lumbar 
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neurons to their natural target region in the 
lumbar spinal cord. This regrowth coincided 
with a progressive recovery of leg movements 
that emerged ~3 to 4 weeks after SCI (fig. S9, B 
to E). Final evaluations were performed at 
8 weeks. In cohort one, five out of six mice 
displayed gait patterns that resembled those 
quantified in mice after incomplete SCI (Fig. 4, 
A to D; fig. SOF; and movie $2). These ex- 
periments were repeated in four subsequent 
cohorts, with a further 22 out of 24 mice (or a 
total of 27 out of 30 mice) demonstrating sim- 
ilar results (fig. S10, A to C). These results 
demonstrate that our regeneration strategy 
led to substantial recovery of walking after 
complete SCI. Notably, the mice that under- 
went regeneration did not walk as well as un- 
injured mice but instead exhibited a behavioral 
phenotype that was comparable to that of 
mice after incomplete SCI (9). 

Our regenerative strategy promoted the re- 
growth of projections from neuronal subpop- 
ulations of the thoracic spinal cord other than 
g\Vsx2::Hoxa7::Zfhx3— lumbar neurons (fig. S8H), 
and we thus could not exclude the involvement 
of these relatively less abundant neuronal sub- 
populations in the recovery of walking. Therefore, 
we tested the necessity of SCVS2::Hoxa?::-Zfhx3— lumbar 
neurons with regenerating projections to their 
appropriate target region in the lumbar spinal 
cord for the recovery of walking after regener- 
ation, given their noted involvement in natu- 
ral recovery after incomplete SCI. To do so, we 
first ablated these neurons in a third cohort of 
mice by expressing the diphtheria toxin recep- 
tor (DTR) in the thoracic spinal cord of Vsx2“° 
mice (Fig. 4E; fig. S11, A and B; and movie S3). 
Eight weeks after SCI, all mice in cohort three 
that received our regeneration strategy had 
regained the ability to walk with gait patterns 
resembling those quantified in mice that had 
recovered walking after incomplete SCI (Fig. 
4, F to H, and fig. S11, C to F). Administration 
of diphtheria toxin reparalyzed every tested 


mouse (Fig. 4, F to H; fig. Sil, C to F; and - 


movie S3). Anatomical analyses confirmed the 
near-complete ablation of Vsx2°% neurons in 
the thoracic spinal cord (fig. SIIB). These re- 
sults established the role of thoracic Vsx20% 
neurons in the recovery of walking after regen- 
eration. However, they did not establish the 
respective role of local versus projection Vsx20% 
neurons. 

Therefore, we tested the necessity of projec- 
tions from SCVS?"Hox87:2ihx3 lumbar y eyrons to 
the lumbar spinal cord in the recovery of walk- 
ing after complete SCI. We designed an inter- 
sectional chemogenetic strategy that allowed us 
to silence regenerated SCVS?"Hox#7:2fhx3—lumbar 
neurons once mice demonstrated substantial 
recovery of walking. We infused rAAV2-Efla- 
DIO-FIpo into the lumbar spinal cord of a fourth 
cohort, followed by injections of AAV5-Con/ 
Fon-hM4Di-mCherry into the thoracic spinal 
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Fig. 3. SCVS*2"Hoxa7:2fhx3—lumbar neurons regenerate across an anatomically 


complete SCI. (A) Overview of the experimental approach enabling regeneration 
across an anatomically complete SC! and into walking execution centers. 

(B) tRFP-labeled axons in composite tiled scans of horizontal sections from 
epresentative mice. Dotted lines demarcate astrocyte proximal and distal 
borders around the lesion core. Dashed line demarcates the lesion center. Line 
graph demonstrates axon density at specific distances past lesion centers 
(normalized to the density rostral to the lesion site). Statistics indicate Tukey 
honest significant difference (HSD) following one-way repeated measures 
analysis of variance (ANOVA). ***P < 0.001 LV-GDNF versus SCI only. 

TTP < 0.001 LV-GDNF versus 2 depots and 3 depots groups. (Right) Bar graph 
indicates the AUC of axon density in the walking execution center. Statistics 
indicate Tukey HSD following one-way ANOVA (all P < 0.001). (C) Whole spinal 
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cord visualization of regenerating projections from the lower thoracic spinal cord 
that project to walking execution centers. (D) Representative individual 
electrophysiological traces after microstimulation of the ventral gigantocellular 
nucleus (vGi). (Bottom) Peak-to-peak amplitude of the evoked potentials in each 
experimental group, expressed as a percentage of uninjured mouse responses 
(Pairwise Wilcoxon rank-sum test, P = 0.0064). (E) (Top) The enrichment 
regenerated neurons among neuronal populations of the mouse lumbar spi 
cord is shown within a clustering tree of spinal cord neurons defined in four 
different clustering resolutions, demonstrating the robustness of these findings 
to the resolution at which transcriptionally defined neuronal subtypes are 
defined. (Bottom) The same enrichments are visualized on a progression of 
UMAPs. (F) Cholera toxin subunit B (CTB)-labeled regenerated neurons with 
Vsx2 immunohistochemical colabeling above the anatomically complete SCI. 
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Fig. 4. Axons from SCVS*2#Hoxa7::2fhx3—-lumbar neurons are necessary to 
restore walking after anatomically complete SCI. (A) Overview of the 
experimental approach enabling regeneration across an anatomically complete 
SCI and into walking execution centers. (B) Chronophotography of walking 
with (bottom) and without (top) mechanism-based combinatorial regeneration 
mimicking natural repair processes. (C) Walking was q 
component analysis as described in Fig. 1F (n > 10 gait cycles per mouse, 
n = 6 mice per group, n = 5 mice in the uninjured and 
data and statistics are provided in data S6. (D) The number of mice from two 
cohorts of combinatorial treated animals at 8 weeks post-SCl that were assigned 
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to each main experimental group. Mice were almost exclusively assigned by 
the classifier (see materials and methods, “Behavioral assessments”) to the 
natural repair group, indicating that the walking patterns of regenerated mice 
most resemble those that underwent natural repair. (E) Experimental design for 
cell type-specific diphtheria toxin-mediated neuron ablation of Vsx2°" neurons 
in the mid-thoracic spinal cord and intersectional chemogenetic inactivation 
of regenerated Vsx2°" neurons following mechanism-based combinatorial 
regeneration. (F) Chronophotography of walking in Vsx2°® mice that received 
mechanism-based combinatorial regeneration mimicking natural repair pro- 
cesses coupled to viral injections of AAV5-CAG-FLEX-DTR to induce cell 


uantified using principal 


SCI only groups). Raw 
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type-specific neuronal ablation. (G) Walking was quantified using principal 
component analysis as described in Fig. 1F (n > 10 gait cycles per mouse, 

n= 4 mice per group, n = 5 mice in the SCI only group). Raw data and statistics 
are provided in data S8. (H) Mice were recorded before (left) and after (right) 
1-week administration of diphtheria toxin. Bar graphs indicate the number of 
mice from each group that were assigned to each main experimental group 
(see materials and methods, “Behavioral assessments”). (I) As in (E) but for 


cord (Fig. 41; fig. $12, A and B; and movie S3). 
These mice all demonstrated the expected 
recovery of walking. The administration of 
clozapine-N-oxide (CNO) immediately impaired 
walking in all tested mice, leading to gait 
patterns that resembled those of mice that had 
not undergone regeneration (Fig. 4, J to L; fig. 
$12, B to D; and movie S3). By contrast, a fifth 
cohort of mice that did not receive AAV5-Con/ 
Fon-hM4Di-mCherry infusions were unaffected 
by CNO administration (fig. SIZE). These find- 
ings established that regenerated projections 
from gchVsx2::Hoxaz: :Zfhx3—lumbar neurons to their 
natural target region in the lumbar spinal cord 
contribute to the substantial recovery of walk- 
ing after complete SCI. 


Discussion 


In this study, we investigated the degree to 
which recovery of function after anatomically 
complete SCI will require targeting character- 
ized neurons and regenerating the axons from 
those neurons not simply across lesions but 
also to guide them to reach their natural tar- 
get regions. To address this question, we first 
characterized the molecular identity of the 
neuronal subpopulations in the thoracic spi- 
nal cord that restore walking by relaying sup- 
raspinal commands past an incomplete SCI 
(9). We then traced the connectome of these 
neurons and found that their natural projec- 
tion pattern extends several segments caudally 
to the lumbar spinal cord, where the neurons 
that produce walking reside. We then used our 
multipronged regeneration strategy (J) to stim- 
ulate the axons of these molecularly charac- 
terized neurons to regenerate through fibrotic 
lesion core tissue and into spared neural tissue 
caudal to the lesion. This strategy included re- 
activating dormant neuron-intrinsic growth 
programs, establishing matrix support for axons 
to grow through non-neural lesion core tissue, 
and supplying a gradient of chemoattraction 
to guide these axons to the caudal side of the 
injury (J). We show that regenerating these 
neurons to reach simply across lesions had 
no effect on the recovery of walking. By con- 
trast, refining our strategy to enable graded 
chemoattraction and guidance of regenerat- 
ing axons to their natural target region in the 
lumbar spinal cord promoted substantial re- 
covery of walking after complete SCI. We ap- 
plied projection-specific snRNA-seq to identify 
the neuronal subpopulations with regenerat- 
ing axons past a complete SCI and demon- 
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chemogenetic inactivation 
spinal cord. 


strated that our strategy regenerated axons 
from the neuronal subpopulations that restore 
walking after incomplete SCI. 

Our causation-testing loss-of-function experi- 
ments show that the restoration of function 
was dependent on the regenerated axons of 
characterized neurons. To chemoattract regen- 
erating axons, we expressed GDNF, a pleio- 
tropic growth factor with the potential to affect 
different cells. Potential limitations of our 
study are that it is possible that GDNF may 
have had unexplored effects on lumbar spinal 
cord cells in such a way as to facilitate the re- 
formation of functional connections, or that 
simply greater bulk regeneration may account 
for the better functional outcome. 

These findings show that reestablishing the 
projections of molecularly defined neuronal 
subpopulations to their natural target region 
forms an essential yet previously unidentified 
requirement for axon regeneration strategies 
aimed at restoring lost neurological functions. 
This understanding has important implications 
for the design of therapies for larger mammals 
and humans, because the potentially long dis- 
tance over which regenerated projections will 
have to grow to restore function may require 
strategies with complex spatial and temporal 
features. 

We posit that applying the principles dem- 
onstrated here of (i) identifying and regenerat- 
ing the axons of functionally relevant neuronal 
subpopulations, (ii) determining the require- 
ments for reactivating neuron-specific devel- 
opmental growth programs, (iii) identifying 
chemoattractants able to guide different types 
of transected axons past lesions to reach their 
natural target regions, and eventually combin- 
ing these biological repair principles with com- 
plementary neuromodulation strategies (JO, 14, 48), 
will unlock the framework to achieve mean- 
ingful repair of the injured spinal cord and may 
expedite repair after other forms of central ner- 
vous system injury and disease (49-57). 
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MINING IMPACTS 


Impacts of metal mining on river systems: a 


global assessment 


M. G. Macklin'2**, C. J. Thomas**, A. Mudbhatkal’, P. A. Brewer®, K. A. Hudson-Edwards®, 
J. Lewin®, P. Scussolini’, D. Eilander”®, A. Lechner?, J. Owen?®°, G. Bird", 


D. Kemp”, K. R. Mangalaa”? 


An estimated 23 million people live on floodplains affected by potentially dangerous concentrations of 
toxic waste derived from past and present metal mining activity. We analyzed the global dimensions 
of this hazard, particularly in regard to lead, zinc, copper, and arsenic, using a georeferenced global 
database detailing all known metal mining sites and intact and failed tailings storage facilities. We 
then used process-based and empirically tested modeling to produce a global assessment of metal 
mining contamination in river systems and the numbers of human populations and livestock exposed. 
Worldwide, metal mines affect 479,200 kilometers of river channels and 164,000 square kilometers 
of floodplains. The number of people exposed to contamination sourced from long-term discharge 

of mining waste into rivers is almost 50 times greater than the number directly affected by tailings 


dam failures. 


n 2018, mining had a market capital value 
of almost a trillion US dollars and $600 
billion in revenue (J). It has been esti- 
mated that the annual production of solid 
mine wastes (including those from me- 
tal mining) now makes up one-third of the 
sediment budget for the Earth (2, 3), and that 
~1 million km? of the world is covered with mine 
waste (4). Many of the richest geological depos- 
its are being or have already been exploited, 
and companies are now turning to deposits 
with lower-grade ores. These lower-grade ores 
generate more waste per unit extracted, and 
damage to the Earth’s surface is likely to be 
exacerbated (5). Some of these wastes contain 
elements such as arsenic, lead, and mercury in 
concentrations that may pose a serious risk 
to ecosystems and human health at multiple 
trophic levels (6). 
Various multilink exposure pathways exist 
for humans to ingest or inhale contaminant 
metals from mine sites and floodplain soils 


lincoln Centre for Water and Planetary Health, University of 
Lincoln, Lincoln, UK. “Innovative River Solutions, Institute 
of Agriculture and Environment, Massey University, Palmerston 
orth, New Zealand. °Centre for the Study of the Inland, 
La Trobe University, Melbourne, Australia. “University of 
amibia, Windhoek, Namibia. Department of Geography and 
Earth Sciences, Aberystwyth University, Aberystwyth, 
Ceredigion, UK. Environment & Sustainability Institute and 
Camborne School of Mines, University of Exeter, Penryn, 
Cornwall, UK. “Institute for Environmental Studies, Vrije 
Universiteit Amsterdam, Amsterdam, Netherlands. 
®Department of Inland Water Systems, Deltares, Delft, 
Netherlands Institute for Environmental Studies, Vrije 
Universiteit Amsterdam, Amsterdam, Netherlands. 
°Monash University Indonesia, Jakarta, Indonesia. !°Centre for 
Development Support, University of the Free State, 
Bloemfontein, South Africa. "School of Natural Sciences, 
Bangor University, Bangor, Gwynedd, UK. “Centre 
‘or Social Responsibility in Mining, Sustainable Minerals 
nstitute, The University of Queensland, St Lucia, Australia. 
Ministry of Earth Sciences, Government of India, 
ew Delhi, India. 
*Corresponding author. Email: mmacklin@lincoln.ac.uk 


Macklin et al., Science 381, 1345-1350 (2023) 


(6). For example, plants and crops grown do- 
mestically or commercially on contaminated 
soils or irrigated by water contaminated by 
mine waste frequently contain high concen- 
trations of metals and metalloids (hereafter 
referred to as “metals”) (7-9). Animals grazing 
on floodplains may then eat this plant mate- 
rial and sediment, especially after flooding, 
when fresh metal-rich sediment is deposited 
(10). This poses a potential risk to their health 
and to that of the humans who consume their 
meat and milk (10, 17). Fish and shellfish are 
also substantial accumulators of metals and 
represent an important route by which con- 
taminants enter the food chain, especially in 
communities that rely on aquatic resources 
(12, 13). In tropical and subtropical regions, 
the consumption of insects (entomophagy) is 
becoming an increasingly important source of 
protein, especially where human populations 
do not have access to meat. Metals also bio- 
accumulate in insects that live in close prox- 
imity to mine sites, which can then pose a 
potential health risk to humans who use them 
as a major protein source (J4). 

Metal mining represents humankind’s ear- 
liest and most persistent form of environmen- 
tal contamination. Waste from mining began 
to contaminate river systems as early as 7000 
years ago (15). Water was usually involved in 
the extraction and processing of metal ores, 
resulting in metals (both dissolved and sedi- 
ment associated) being supplied to streams 
and rivers, dispersed downstream, and then 
deposited across floodplains that were used 
for agricultural food production. Since the 
mid-19th century, tailings dams have been 
used to store mine waste, which has reduced 
the direct supply into rivers. However, such 
structures are prone to failure, with often se- 
vere consequences for ecosystems and human 
communities downstream (J6). 
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Groundbreaking research (17-19) nes Chec 
past 40 years has demonstrated the rol 
dispersal (20), storage (21-23), and remobi- 
lization (24) processes in the environmental 
fate of metals within rivers affected by metal 
mining, including those affected by long-term 
mining activities and those contaminated by 
tailings dam failures (TDFs). These studies have 
shown that >90% of metals are sediment asso- 
ciated, are transported 10 to 100 km downstream 
from the point where mining operations dis- 
charge into a watercourse, and are deposited 
and stored along river channels and especially 
on floodplains for extended (107 to 10* years) 
time periods (78, 25). In the first industrial na- 
tions of western Europe and the US, flood-related 
remobilization of contaminated floodplain sed- 
iment resulting from historical mining during 
the 19th and early 20th centuries (19, 21, 24) now 
constitutes the primary source of metal conta- 
minants in rivers. Small catchments (<500 km?) 
can be extremely contaminated, but the larger 
rivers into which they feed tend to have con- 
siderably lower contamination levels because 
metal mine waste is either stored in upstream 
floodplains (26) or is diluted by uncontaminated 
sediment from nonmining sources (27). 

Here, we bring together all spatial data that 
can at present be obtained globally on metal 
mines (both active and inactive) and tailings 
dams, including those that have failed. We then 
calculate the area of floodplains and the num- 
ber of people and livestock potentially exposed 
[see (28)]. This quantifies, for the first time, the 
off-site environmental impacts of metal min- 
ing activity on river systems worldwide, and the 
consequent number of people and livestock that 
could potentially be exposed to unacceptably 
high concentrations of toxic metals. 


Methodology 


Data on active (defined as still in operation in 
database sources published or accessed before 
29 August 2022) and inactive (defined in data- 
base sources as closed) metal mines worldwide, 
including their location, mineral commodities, 
and operational status, were compiled into the 
Water and Planetary Health Analytics (WAPHA) 
global metal mines database (29) using QGIS 
software (30). Mine information was acquired 
from the US Geological Survey Mineral Re- 
sources Data System (37) (73,917 mines world- 
wide), the BritPits database of the British 
Geological Survey (32) (8459 mines in the UK), 
the S&P Global Market Intelligence database 
(33) (2584 mines worldwide), and our own com- 
pilation of ~100,000 additional mines from aca- 
demic and gray literature, including regional 
data published by government agencies and 
industry (tables S1 and S2). Twenty-one types 
of active and inactive metal mines were used 
in our modeling and analysis (table S3, A and B). 
We also compiled a georeferenced global data- 
base of metal mining tailings storage facilities 
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(TSFs) and TDFs based on the International 
Commission on Large Dams/United Nations 
Environment Programme (ICOLD/UNEP) 2001 
compilation in bulletin 121 (34), the World 
Information Service on Energy (35), and the 
World Mine Tailings Failures and Global 
Tailings Portal databases (36), in conjunction 
with our own compilation of source literature 
published by government and nongovernment 
organizations (29) (tables S4 and S5). Together, 
these spatial data represent, to our knowledge, 
the most comprehensive compilation of metal 
mine locations to date. 

We identified catchments affected by active 
and inactive metal mining by overlaying in 
MATLAB (37) all mines, TSFs, and TDFs onto 
level 4 polygons of the HydroBASINS modeling 
framework (38). These depict watershed bound- 
aries and subbasin delineations at 15-arc 
sec resolution. Within all subbasins, we esti- 
mated the length of river channel (in kilometers), 
the floodplain area (in square kilometers), 
and the 100-year flood inundation area (in 
square kilometers) downstream of each mine 
likely to be contaminated using a new process- 
based model of sediment-associated mining 
contaminant dispersal (figs. S1 to S12 and 
table S6). This model calculates the extent 
downstream of a mine where concentrations 
of metal (copper, ~10.3 km; lead, ~8.6 km; and 
zinc, ~6.5 km) and arsenic (~45.6 km) in river 
channel and floodplain sediments exceed guide- 
line values for intervention and remediation 
(table S7). We ground-truthed our results in 
15 catchments across Europe (UK, Romania, 
and Bulgaria), ranging in size from 46 to 
232,193 km? (tables $8 to S11). Where tailings 
dams have failed and their prefailure crest 
height and volume of impounded waste are 
known (165 from a total of 257), the length of 
river channel and area of floodplain affected 
was calculated (39). Using the Socioeconomic 
Data and Applications Center (NASA-SEDAC) 
population data of the year 2020 (40) and the 
Food and Agriculture Organization (FAO) Gridded 
Livestock of the World database (GLW v3.1) 
(41), the number of people and livestock (cattle, 
goats, and sheep) living on mining-affected 
floodplains was determined (tables S12 to S14). 
The area of irrigated land based on the FAO 
Global Map of Irrigation Areas (GMIA) in 
mining-affected floodplains was also calculated 
(table S15). Our geospatial integration of metal 
mine, TSFs, TDFs, hydrographic, geomorphic, 
demographic, and livestock databases en- 
abled us to evaluate globally the human pop- 
ulation directly exposed and the number of 
livestock in contaminated areas with the po- 
tential for uptake of contaminant metals into 
the human food chain (table S15). 


Results 


Worldwide, there are recorded 22,609 active 
and 159,735 abandoned mines, 11,587 TSFs, 
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and a further 257 reported TDFs (Fig. 1). Metal 
mining has affected some 164,400 km” of flood- 
plains (112,400 km? from inactive mines and 
52,000 km” from active mines), and 480,700 km 
of river channels (active, 114,000 km; inactive, 
365,200 km) are affected by mining (Fig. 2 and 
table S16). We estimate that 23.48 million 
people live on mining-affected floodplains that 
also support 5.72 million livestock and include 


Fig 1. Global distributions 
(Equal Earth projection) 
of active and inactive 
metal mines and intact 
and failed TSFs by site 
and summed by conti- 
nent. Shown are inactive 
metal mines [(A), solid blue 
circles], active metal mines 
(B), solid red circles], 
number of active/inactive 
mines by continent (C), 
TSFs [(D), blue triangles = 
intact, red triangles = 
failed], and number of 
intact and failed TSFs 

by continent (E). Disaggre- 
gated on a continental 
scale, North America 
(active, 11,871; inactive, 
80,995) and Oceania 
(active, 3430; inactive, 
53,233) have the largest 
number of mines, followed 
by South America (active, 
3240; inactive, 14,577), 
Europe (active, 1024; 
inactive, 9080), Asia (active, 
1817; inactive, 1473), and 
Africa (active, 1227; inactive, 
377) (table S1). Oceania, 
Europe, North America, and 
South America are mostly 
affected by inactive mining, 
whereas active mining activ- 
ities are more important 

in Africa and Asia (table S1). 
We recorded 11,844 TSFs, of 
which 257 had failed. Asia 2 tacts onmgs an 
has nearly half of the world’s 
TSFs, with North America 
recording both in absolute 
(n = 107) and proportional 
(42%) terms the largest 
number of TDFs (table S4). 


% 
BE inactive mines 
active mines 


65,600 km? of irrigated land (Fig. 3 and table 
S16). Disaggregated on a continental scale, North 
America (active, 11,871; inactive, 80,995) and 
Oceania (active, 3430; inactive, 53,233) have 
the largest number of mines, followed by South 
America (active, 3240; inactive, 14,577), Europe 
(active, 1024; inactive, 9080), Asia (active, 1817; 
inactive, 1473), and Africa (active, 1227; in- 
active, 377) (table S1). Oceania, Europe, North 


wre 
1 intact taiting storage facilities. } -\ 
BE tated tating storage tactics. { 
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America, and South America are mostly af- 
fected by inactive mining, whereas active min- 
ing activities are more important in Africa and 
Asia (table S1). 

North America stands out as the most affected 
region in terms of river length (198,400 km) 
and surface area of floodplains (43,100 km?) 
(Fig. 3 and table S16). River channels and 
floodplains are also extensively affected in 
Oceania (river length, 106,100 km; floodplain 
area, 33,800 km”), South America (river length, 
81,700 km; floodplain area, 38,600 km”) and Asia 
(river length, 60,900 km; floodplain area, 33,500 
km”), but to a lesser extent in Europe (river 
length, 14,800 km; floodplain area, 4900 km?) 
and Africa (river length, 17,300 km; floodplain 
area, 10,400 km”) (Fig. 3 and table S16). Asia, with 
14.53 million people living in affected flood- 
plains, is the most vulnerable region in terms of 
human exposure, followed by North America 
(4.09 million), Europe (1.73 million), South 
America (1.53 million), Africa (1.19 million), 
and Oceania (0.42 million) (Fig. 3 and table S16). 

Undertaking the same audit for river catch- 
ments in which tailings dams have failed is 
less straightforward because data on dam height 
and volume of waste stored are only available 
for 165 of 257 recorded failures. Using this 
large but incomplete database, we calculate 
that, worldwide, a minimum of 5300 km of 
river channels and 4950 km? of floodplains 
have been affected by TDFs (Fig. 3 and table 
S17). The number of people living on flood- 
plains that have been directly affected by 
TDFs is substantial (0.32 million) (Fig. 3 and 
table S17), but our modeling indicates that 
the impact of these events on river systems, 
and potential human population and livestock 
exposure, is two or three orders of magnitude 
smaller than in basins that have experienced 
inactive and/or active mining activity (Fig. 3 
and table S17). This reflects the small count of 
TDFs compared with the much larger number 
of active and inactive mines worldwide. 

Judging by the number of people living on 
floodplains affected by mining activity, pop- 
ulations in China (9.74 million) and the US 
(3.17 million) are potentially most at risk of ex- 
posure to contaminant metals and metalloids 
(tables S12 to S14). South Korea (0.79 million), 
Germany (0.35 million), and the UK (0.31 mil- 
lion) are ranked globally in the top 12 (table S13) 
in terms of population exposed to riverine- 
related metal hazards, with the environmental 
legacy of historical mining being most prob- 
lematic in western Europe. Countries that by 
world standards have relatively short rivers 
(e.g., Chile, Japan, New Zealand, South Korea, 
and UK), and particularly those with low sedi- 
ment loads (e.g., Germany and UK), have higher 
levels of river channel and floodplain con- 
tamination (table S15) as a consequence of 
limited dilution of sediment-associated mine 
waste (42). 
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Fig. 2. Global river length, floodplain, and 100-year flood inundation areas affected by metal 
mines and failed TSFs. Inactive mines are shown by solid yellow circles, active mines by open red 
circles, and failed TSFs by purple triangles. The y-axis units are logjg numbers. Symbols for inactive 
and active mines indicate predicted values from the WAPHA model with 90% confidence intervals; 
symbols for failed TSFs are observed values for total river length and floodplain areas affected by 

257 documented TDFs. Inactive metal mines have a substantially larger global environmental impact on 
river channels, floodplains, and valley floors located within the 100-year inundation zone than active 
mines. Although the impact of failed TSFs on river systems worldwide is considerable, the combined 
environmental effect of inactive and active mines on river channels and floodplains is estimated to be 


30 to 90 times larger. 


Implications for ecosystems and 

human health 

This global survey of the environmental im- 
pacts of metal mining, and the consequent po- 
tential exposure risk of humans and livestock 
to toxic metals, reveals that an estimated 23 
million people live on floodplains affected by 
potentially hazardous concentrations of toxic 
waste derived from historical and/or active up- 
stream mining activity. However, because of 
incomplete reporting of mine locations and 
TDFs, most notably within China, India, and 
Russia, this is certainly an underestimation 
of the population at risk. In addition, the im- 
pacts of modern artisanal mining on river sys- 
tems in the global south are still very poorly 
documented, and this should be the next crit- 
ical step for understanding the worldwide im- 
pact of mining. 

Ecological and societal impacts of recent TDFs 
are locally catastrophic and have resulted in 
considerable loss of life (5). However, our assess- 
ment indicates that the number of people likely 
to be exposed to unacceptably high concen- 
trations of toxic metals by these accidents 
(estimated to be >0.32 million) is almost 50 
times smaller than in river floodplains affected 
by historical (11.39 million) and active (12.08 
million) metal mining. Exposure of workers 
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directly engaged in current industrial metal 
mining and ore processing, smelting, and small- 
scale artisanal mining, which are three of the 
top five polluting industries worldwide (43), is 
not captured by our study. Preliminary model- 
ing suggests that these industries pose a risk to 
health in between 18 to 23 million people (43), 
which is comparable to the number of people 
whom we have estimated to live on mining- 
contaminated floodplains worldwide (table 
S16). Our georeferenced database and process- 
based predictive modeling provide tools for 
locating areas of highest potential exposure 
where monitoring, and potentially interven- 
tion, should be prioritized (tables S12 to S14), 
and further highlights catchments (figs. S9 
to S12) where new data are required. These 
would include locations in the historically 
mined regions of Andean South America, 
Australia (Victoria), Southeast and Central 
Asia (Pamir and Tien Shan), North America, 
and the UK (Wales and northern England; 
Fig. 4), in addition to those in Amazonia, sub- 
Saharan Africa, Southeast Asia, and southern 
and eastern China, where most of the world’s 
new but poorly regulated mining operations 
are located (table S13). 

We conclude that metal mining contamina- 
tion of rivers and floodplains poses a possible 


3 of 6 


c 


RESEARCH | RESEARCH ARTICLE 


A 
108, ' \ \ \ \ \ \ f 
107; i \ 
I 1 I 1 1 1 1 I 
1 1 1 1 i} 1 1 1 
1 0°; 9 1 1 i 1 1 1 1 1 1 
1 ? 1 A 1 1 1 1 1 
10°) if A 4, 
i 1 1 1 I ‘A 1 i 
1 1 1 1 1 i) i] i] 
1 A 
10%) A) 4 , | ail i 
1 i 1 1 1 1 1 ! 
1 i 9 1 1 i] 1 1 i] 
oe 4 | ee | 4 
i 1 1 1 i) 1 i) I 
1 1 1 i} 1 1 i} 1 
rd i pp at a 
1 1 A 1 i 1 i 1 ' 
| | ! | | i | 1 
10| } 
1 1 1 1 1 1 1 1 
' i} 1 ' i} 1 1 1 
1 
Livestock | Irrigated Livestock | Irrigated Livestock | Irrigated Livestock | Irrigated | Human 
North America South America Asia Africa 
(including 
Greenland) 
Fig 3. Human population, number of livestock, and area of irrigated land affected by metal mines and fail 
active mines by open red circles, and failed TSFs by purple triangles. The y-axis units are logjg numbers. Symbols fo 
WAPHA model with 90% confidence intervals; symbols for failed TSFs are observed values for irrigated areas (in squ 
substantial additional hazard to the health of | floodplains (J0, 24, 51), which now in many | 6. 
both urban and rural communities in Africa | parts of the world constitute the principal : 
and Asia, which are already burdened with | source of metal contaminants in rivers. In ad- ‘ 
water-related diseases. For the first industrial | dition, because of rapid urbanization and in- | 3s. 
nations of western Europe and the US, this | creasing settlement in floodplains worldwide | 2. 
contamination constitutes a major and grow- | (notably in sub-Saharan Africa and Southeast 
ing constraint to water and food security, com- | Asia), the proportion of the population exposed 2 
promises ecosystem services (44), and increases | to flooding and contaminated flood waters has 3 
antimicrobial resistance in the environment | risen by 20 to 24% from 2000 to 2025 (52). The 4 
(45). Global, multiscalar data with sufficient | expansion of lower-grade metal ore mining, 
granularity are not presently available to quan- | which generates more waste per unit extracted, 5 
tify the potential risks to ecosystem and human | coupled with an increasing frequency of cata- 6 
health of this contamination. For example, the | strophic TDFs (53), underlines the need to ‘ 
export of food produced on contaminated | routinely incorporate outputs from large-scale 
floodplains will often enter a spatially exten- | mining databases (as reported here) into envi- 8 
sive food chain, and this will require new | ronmental monitoring programs and metal 
human biomonitoring and food basket studies | exposure pathway analyses. This will facilitate 9 
(46). However, existing evidence already dem- | better management of metal contamination | 20. 
onstrates that human health can be directly | and risk of exposure downstream of historical- aa 
affected through the ingestion, inhalation, and | ly and active metal mine sites. 20. 
absorption of metal-contaminated soil and 
indirectly through the quantity and quality of | REFERENCES AND NOTES 23. 
food that is derived from soil-based agriculture 1. PwC, "Mine 2018: Tempting times” (PwC, 2018); hitps//wwwpwecon’ | 24- 
(9, 14, 47-49) id/en/publications/assets/eumpublications/mining/mine-2018. pdf. 
aa = . y 2. U. Férstner, “Introduction,” in Environmental Impacts of Mining 25. 
The increasing frequency of river flooding Activities: Emphasis on Mitigation and Remedial Measures, 26. 
associated with anthropogenic global climate J. M. Azcue, Ed. (Springer, 1999), pp. 1-3. 
warming (50) can result in augmented erosion 3. J. Syvitski et al., Nat. Rev. Earth Environ. 3, 179-196 (2022). ah 
dsedi te ciated metal remobilization 4. B. G. Lottermoser, Mine Wastes: Characterization, Treatment - 
and seciment-asso , ; : and Environmental Impacts (Springer, ed. 3, 2010). 29. 
from recently and historically contaminated | 5. K. Hudson-Edwards, Science 352, 288-290 (2016). 


Macklin et al., Science 381, 1345-1350 (2023) 


Inactive metal mines 


© Active metal mines 


22 September 2023 


Failed tailings storage facilities 


? 


A 


A 


' 
1 
1 
1 
1 
1 
1 
1 
i} 
1 
1 
1 
1 
i} 
i 
1 

, 1 
' 
' 
! 
1 
i 
i 
' 
' 
| 
I 
' 
| 
| 
1 
' 
i} 


Livestock | Irrigated Livestock 


Areas 


Irrigated 
Areas 


Oceania 
(including 
Australia) 


Europe 


led TSFs. Inactive mines are shown by solid yellow circles, 


inactive and active mines indicate predicted values from the 


are kilometers) affected by 257 documented TDFs. 


J. E. Gall, R. S. Boyd, N. Rajakaruna, Environ. Monit. Assess. 
187, 201 (2015). 

J. R. Miller, K. A. Hudson-Edwards, P. J. Lechler, D. Preston, 
M. G. Macklin, Sci. Total Environ. 320, 189-209 (2004). 

D. Xu et al., Sci. Rep. 12, 9211 (2022). 

M. Roy, L. M. McDonald, Land Degrad. Dev. 26, 785-792 (2015). 


. S.A. Foulds et al., Sci. Total Environ. 476-477, 165-180 (2014). 


S. Giri, A. K. Singh, J. Food Sci. Technol. 57, 1415-1420 (2020). 


. H. Ali, E. Khan, Environ. Chem. Lett. 16, 903-917 (2018). 
. Y. Jia, L. Wang, Z. Qu, Z. Yang, Environ. Sci. Pollut. Res. Int. 25, 


7012-7020 (2018). 


. S. Mwelwa, D. Chungu, F. Tailoka, D. Beesigamukama, C. Tanga, 


Sci. Total Environ. 881, 163150 (2023). 


. J. P. Grattan et al., Sci. Total Environ. 573, 247-257 (2016). 
. D. Kossoff et al., Appl. Geochem. 51, 229-245 (2014). 
. J. Lewin, B. Davies, “Interactions between channel change 


and historic mining sediments” in River Channel Changes, 
K. J. Gregory, Ed. (Wiley, 1977), pp. 353-367. 


. J. Lewin, M. G. Macklin, “Metal mining and floodplain 


sedimentation in Britain,” in International Geomorphology 
Part 1, V. Gardiner, Ed. (Wiley, 1987), pp. 1009-1027. 


. W. L. Graf et al., Catena 18, 567-582 (1991). 


M. G. Macklin, J. Lewin, Earth Surf. Process. Landf. 14, 233-246 
(1989). 

M. G. Macklin, R. B. Dowsett, Catena 16, 135-151 (1989). 

M. G. Macklin, K. A. Hudson-Edwards, E. J. Dawson, Sci. Total 
Environ. 194-195, 391-397 (1997). 

J. M. Martin, M. Meybeck, Mar. Chem. 7, 173-206 (1979). 

|. A. Dennis, M. G. Macklin, T. J. Coulthard, P. A. Brewer, 
Hydrol. Processes 17, 1641-1657 (2003). 

M. G. Macklin et al., Geomorphology 79, 423-447 (2006). 

|. A. Dennis, T. J. Coulthard, P. Brewer, M. G. Macklin, Earth 
Surf. Process. Landf. 34, 453-466 (2009) UR. 

D. Ciszewski, T. M. Grygar, Water Air Soil Pollut. 227, 239 (2016). 
Materials and methods are available as supplementary materials. 
Water and Planetary Health Analytics, “Global metal mines 
database” (2023); https://doi.org/10.5061/dryad j3tx95xmg. 


4 of 6 


e 


RESEARCH | RESEARCH ARTICLE 


0 150 300 km + ia 


Hee HL 


2°0'0"W 


Romania 


24°20'0"E 


Map symbols 


©@ Inactive metal mine 


National Border 


54°20'0"N 54°30'0"N 


54°10'0"N 


pas 


1°40'0"W 1°20'0"W 


42°30'0"N 


42°20'0"N 


“‘b SSM 
ATS KEES (| 
x Sx 


a S55 —_ 
ee SMO S 

AKC 
RIK 
SRE 


42°10'0"N 


24°40'0"E 25°0'0"E 


Modelled contamination 


MM Lower Cl 


RS 


© Active metal mine 


GFPLAIN250 
Floodplain 


{Predicted 


oe 


Stream network 


Upper Cl 


Fig 4. Examples of WAPHA modeling and mapping of contaminated 
floodplains and river channel reaches linked to inactive and active 

mines. (A and C) Regional index maps for the UK and Eastern European sites, 
respectively. (B and D) Examples of WAPHA modeling and mapping of 
contaminated floodplains and river channel reaches linked to inactive and 
active mines in River Swale in northern England (B) and in Bulgaria (D). Inactive 
mines are shown by solid yellow circles, active mines by open red circles. The 
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academic and gray literature are stored in the WAPHA database 
https://doi.org/10.5061/dryad.j3tx95xmg (29)]. Modeling was 
implemented procedurally in MATLAB v9.9.0 (R2020b) (37) with 
he open source TopoToolbox MATLAB program for the analysis of 
digital elevation models (https://topotoolbox.wordpress.com). The 
modeling workflow is presented in fig. S8 with example code 
available in the WAPHA database [https://doi.org/10.5061/dryad. 
j3tx95xmg (29)]. License information: Copyright © 2023 the 
authors, some rights reserved; exclusive licensee American 
Association for the Advancement of Science. No claim to original 
US government works. https://www.science.org/about/science- 
icenses-journal-article-reuse 


SUPPLEMENTARY MATERIALS 


science.org/doi/10.1126/science.adg6704 
Materials and Methods 

Figs. Sl to S12 

Tables S1 to S17 

References (54-152) 

MDAR Reproducibility Checklist 


Submitted 25 January 2023; accepted 18 August 2023 - 
10.1126/science.adg6704 


6 of 6 


RESEARCH 


MEMBRANES 


Solid-solvent processing of ultrathin, highly loaded 
mixed-matrix membrane for gas separation 


Guining Chen’, Cailing Chen’, Yanan Guo’, Zhenyu Chu’, Yang Pan?, Guozhen Liu’, Gongping Liu 


Yu Han*+, Wangin Jin’, Nanping Xu?? 


1,35 
, 


Mixed-matrix membranes (MMMs) that combine processable polymer with more permeable and selective 
filler have potential for molecular separation, but it remains difficult to control their interfacial 
compatibility and achieve ultrathin selective layers during processing, particularly at high filler loading. 
We present a solid-solvent processing strategy to fabricate an ultrathin MMM (thickness less than 

100 nanometers) with filler loading up to 80 volume %. We used polymer as a solid solvent to dissolve 
metal salts to form an ultrathin precursor layer, which immobilizes the metal salt and regulates its conversion 
to a metal-organic framework (MOF) and provides adhesion to the MOF in the matrix. The resultant 
membrane exhibits fast gas-sieving properties, with hydrogen permeance and/or hydrogen-carbon dioxide 
selectivity one to two orders of magnitude higher than that of state-of-the-art membranes. 


ubnanometer solid-state channels that 

selectively transport small molecules have 

shown potential in membrane-based sep- 

aration (J, 2). Despite their dominance 

in the market, polymeric membranes gen- 
erally do not possess regular and continuous 
subnanometer channels, leading to an inherent 
trade-off between permeability and selectivity 
(3). Nanoporous crystalline materials, repre- 
sented by zeolites and metal-organic frame- 
works (MOFs), can address this challenge by 
providing excellent permeability and selectivity 
through their well-defined pore systems (4-6). 
With existing pure crystalline membranes, it 
remains difficult to control intergranular de- 
fects and maintain their processability toward 
large-scale implementation (7). Alternatively, 
mixed-matrix membranes (MMMs) have 
emerged as a promising class of membrane 
materials with the potential to combine the 
processability of polymers with the excellent 
transport properties of crystalline materials 
(8-10). 

MMMs are commonly fabricated through a 
solution-mixing strategy. This involves casting 
a suspension-containing solvent, polymer, and 
MOF filler onto a glass plate or porous substrate 
and allowing the solvent to evaporate, resulting 
in the formation of either a micrometer-thick 
self-standing membrane or a submicrometer- 
thick composite membrane. Achieving inter- 
facial compatibility between the polymer matrix 
and the MOF filler is challenging, particularly 
when the filler loading is high (>30 to 40 vol %) 
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(1, 7). Issues such as filler agglomeration, 
sedimentation, and filler-polymer interfacial 
defects may arise during the solvent evapora- 
tion process. Ultrathin MMMs are essential for 
practical applications, as evidenced with poly- 
meric membranes (17), but they are more dif- 
ficult to fabricate than self-standing MMMs 
because of the more pronounced agglomeration 
of smaller nanofillers and the severe penetration 
of the casting solution into the substrate pores. 

We present a solid-solvent processing (SSP) 
approach to fabricating thin, highly loaded 
MMMs. In contrast to existing methods, the 
polymer matrix acts as a solid solvent that uni- 
formly dissolves and immobilizes metal salts 
after evaporating the metal salt@polymer aque- 
ous solution, forming an ultrathin and spatially 
continuous metal salt@polymer precursor layer 
(Fig. 1). The high cosolubility of the metal salt 
and polymer not only enables the ultrathin- 
ning process but also allows for high loading of 
metal salt in the polymer matrix. After a ligand 
vapor treatment, the metal salt in the precur- 
sor layer undergoes in situ conversion to nano- 
porous MOF crystals, leading to an ultrathin, 
highly loaded MOF @polymer MMM. During 
this process, the solid solvent maintains the 
MMM integrity and inhibits the agglomeration 
of MOF particles as the loading increases. Ad- 
ditionally, the flexible polymer segment tightly 
attaches to the generated MOF particles, result- 
ing in an intact MOF-polymer interface. 

To demonstrate the SSP strategy, we chose a 
typical hexafluorosilicate (SIFSIX)-series MOF, 
which contains fluorosilicate anions (SiF,” ) and 
has shown potential for use in gas separation 
(12). Solid solvents, including polyethylene glycol 
(PEG) and polyvinyl alcohol (PVA), were selected 
owing to their good compatibility and solubil- 
ity with fluorosilicate in an aqueous solution. 
Upon heating the CuSiFg@polymer precursor 
and pyrazine (pyz) ligand in a closed reactor, 
the ligand vaporizes and diffuses into the pre- 
cursor layer. As shown in Fig. 1, inset, the SiF,”” 
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pillared copper (II) center is octahedrally « Chee 

4 ‘ | upde 
dinated to four nitrogen atoms of the pyrat..-, 
where protruding pyrazine planes are stacked 
above each other along the a axis. The in situ- 
formed MOF fillers in the MMM possess a 
window aperture size of 2.5 by 2.2 A (13), which 
is attractive for hydrogen-carbon dioxide (H.- 
CO.) molecular sieving separation (fig. S1). The 
detailed fabrication procedure of the MMM is 
shown in fig. S2. 


Fabrication of MOF@polymer MMMs 


The surface pore size of the polyacrylonitrile 
(PAN) substrate is ~20 nm (fig. S3). After spin- 
coating the CuSiF,@PEG precursor solution, a 
defect-free and polymer-like smooth surface was 
observed (Fig. 2A). After the reaction of CuSiF, 
with pyrazine vapor, granular protrusions ap- 
peared and resulted in a rougher membrane 
surface (Fig. 2B and fig. S4). The morphology | 
evolution and the color change from light green 
to blue (Fig. 2, A and B, insets) indicated that 
the CuSiF, salts in the precursor were con- 
verted into Cu(SiFs)(pyz)3 MOFs. The mem- 
brane fabrication can be controlled during the 
precursor solution preparation and coating pro- 
cess, which includes three key parameters: poly- ‘ 
mer molecular weights, metal salt:polymer mass 
ratios, and spin-coating cycles (supplementary 
materials, materials and methods). By facilely 
controlling the solution properties and coating ‘ 
parameters, the CuSiFg@PEG precursor and its 
Cu(SiFs)(pyz)3@PEG MMM can be fabricated 
as thin as 50 nm (Fig. 2C) without visible defects. 

To identify the metal salt conversion into a 
MOF, we used transmission electron microscopy 
(TEM) and energy-dispersive x-ray spectroscopy 
(EDX) mapping to visualize the precursor and 
membrane composition. In the CuSiFgs@PEG 
precursor, these showed a uniform distribution 
and effective immobilization of CuSiFg metal - 
salt in the PEG matrix and an intact metal salt- ° 
polymer interface (Fig. 2D and fig. S5). The 
magnification in Fig. 2E showed CuSiFg dis- 
persed in the form of nanoparticles with sizes_ - 
varying from 5 to 10 nm, and the narrow d- 
spacing of 0.21 nm implied a nonporous struc- 
ture. Because the Cu(SiFg)(pyz)3 MOF is very 
sensitive to electron radiation, we studied its 
analog, Cu(SiF,)(bpy)s, constructed from CuSiF, 
and 4,4'-bipyridine (bpy), because it is more sta- 
ble at high voltage (/4). In the resulting MMM, 
scattered CuSiF, nanoparticles disappeared 
and were replaced by ordered and continuous 
crystals. The increased interplanar distances 
verified the formation of subnanometer chan- 
nels. The d-spacings of 0.80 and 0.55 nm cor- 
responded to the (002) and (020) planes of the 
Cu(SiF,)(bpy). structure (Fig. 2F). The Cu, F, 
and C element dispersion from the EDX map- 
ping indicated that the efficient gas transport 
channels were preserved (fig. S6). Additional 
results for the supporting structural conver- 
sion are provided in fig. S7. 
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Fig. 1. Schematic of the mixed-matrix membrane (MMM) fabricated by a solid-solvent processing (SSP) strategy. 


We used the ultralow-dose high-resolution 
TEM (HRTEM) technique (15) to image the 
in situ-formed MOF structure within the PEG 
MMM. The image shows that the crystalline 
MOF adheres tightly to the amorphous PEG 
(Fig. 2G), forming a seamless MOF-polymer 
interface. This phenomenon can be attributed to 
the excellent compatibility between the metal 
salt and the polymer and the polymer flexibility. 
The lattice fringes exhibited by the MOF phase 
with interlayer distances of 0.50 and 0.35 nm 
correspond to the (111) and (200) crystal planes 
of Cu(SiF,)(pyz)3, respectively (Fig. 2H and fig. 
S8). The HRTEM image matches well with the 
[011]-projected structure model of Cu(SiFg)(pyz)3 
(Fig. 21), confirming the successful formation 
of the desired MOF structure within the PEG 
matrix (J6). 

We also detected the crystal transition in the 
solid solvent using crystallography and spec- 
troscopy. The CuSiFg@PEG precursor showed 
distinct x-ray diffraction (XRD) peaks con- 
sistent with a metal salt. After ligand vapor 
treatment, there was a sharp transition from 
CuSiF, to Cu(SiFs)(pyz)3 (13), demonstrating 
detectable MOF nucleation with the assistance 
of the solid-state polymer solvent (Fig. 3A). 
X-ray photoelectron spectroscopy (XPS) and 
infrared spectroscopy (IR) spectra also con- 
firmed the presence of pyrazine ligand and 
the structural conversion (fig. S9). Meanwhile, 
the disappearance of the CuSiF, peaks indi- 
cated the high conversion rate of metal salt to 
MOF, which was further verified with positron 
annihilation spectroscopy. The right shift of 
the positron lifetime indicated a substantial 
increase in subnanometer cavities during the 
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conversion of the CuSiF,@PVA precursor to 
Cu(SiFs)(pyz)3@PVA MMM (Fig. 3B). 

After confirming the in situ conversion of 
CuSiF, to Cu(SiF¢)(pyz)3 MOF in the polymer 
matrix, we determined the actual MOF load- 
ing of MMM using thermogravimetry analysis 
(fig. S10; detailed calculations are described in 
the supplementary materials). We found that 
by simply adjusting the molecular weight of 
the polymer and the ratio of the metal salt: 
polymer, the MOF loading can increase be- 
yond 50 vol % and get up to 80 vol % (Fig. 3C) 
for Cu(SiF¢)(pyz);@PEG, Cu(SiFs)(bpy)2@PEG, 
or Cu(SiF¢)(pyz)3@PVA MMMs (figs. S11 to 
$14), which is rarely accessible through con- 
ventional MMMs preparation methods. For 
Cu(SiFs)(pyz)3@PEG MMMs, PEG with a suitable 
molecular weight (M,,) of 10,000 to 70,000 g/mol 
can alleviate the penetration of CuSiFs;@PEG 
coating solution into the substrate pores and 
thus form an ultrathin precursor layer (fig. 
$15). The MOF content correlated with the 
metal salt:PEG mass ratio (fig. $16). Although 
Cu(SiFs)(pyz)3@PEG MMM with CuSiF,/PEG 
mass ratio of 10:1 demonstrated the highest 
MOF loading of 85.6 vol %, the PEG polymer 
was insufficient to compensate for the defects 
between the MOF particles, and MOF parti- 
cles were exposed on the substrate (fig. S17). 
To maximize the MOF loading while ensuring 
full compensation of the grain defects, PEG 
with a M,, of 10,000 g/mol and a CuSiF,/PEG 
mass ratio of 5:1 were considered optimal for 
Cu(SiFs)(pyz)3@PEG MMM. For PVA, the op- 
timal ,, was 75,000 g/mol with a CuSiF./PVA 
mass ratio of 2.5:1 (fig. $18). By increasing the 
spin-coating cycles from 10 to 40, the thick- 
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ness of the Cu(SiFs)(pyz)3@PEG MMM could 
be tuned from 50 to 90 nm (fig. S19), and the 
integrity of the MMM layer was enhanced, as 
reflected by the Hy-CO, perm-selectivity (figs. 
$20 and S821). 


Transport property of MOF@polymer MMMs 


To understand molecular transport in MMM, 
we established an ideal resistance model closely 
related to MOF loading. As schematically shown 
in Fig. 3D, as the MOF loading exceeds 50 vol %, 
MOF particles will be the dominant phase, 
eventually forming interconnected MOF nano- 
channels in the polymer matrix and thereby 
dominating the molecular permeation (Fig. 3E 
and tables S1 to S3) (4, 9, 10, 17-37). Within 
such a context, the selective molecular trans- 
port is mainly governed by MOF rather than 
polymer, which is expected to achieve attractive 
transport properties close to those of a pure 
MOF crystalline membrane. 

We observed a permeation rate cut-off between 
H, and CO, in the 80.4 vol % Cu(SiF¢)(pyz)3@ 
PEG MMM, which was consistent with the mo- 
lecular size sieving of the Cu(SiF,¢)(pyz)3 pore 
structure (fig. $22). In single-gas permeation, 
the permeance for Nz and CH, surpassed that 
of CO., and the same trend for C,H, over C2H,, 
which is likely attributed to the framework 
flexibility and the strong adsorption of CO, 
and C,H, hindering their diffusion in MOF 
channels (37). While in mixed-gas permeation, 
lowered gas permeance and reverse COs-CH,, 
selective permeation were observed, which can 
be attributed to the influence of competitive ad- 
sorption and the preadsorbed CO, hindering 
the diffusion of CH,. Similar results were 
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Fig. 2. Morphologies of MMMs fabricated by using SSP. (A) SEM image of 
the top surface of the CuSiF«@PEG precursor. (Inset) Digital photo of the 
CuSiFg@PEG precursor. Scale bar, 5 mm. (B) Surface SEM image of Cu(SiF¢)(pyz)s@ 
PEG MMM. (Inset) Digital photo of Cu(SiF,¢)(pyz)3@PEG MMM. Scale bar, 5 mm. 
(C) Cross-section SEM of Cu(SiFe)(pyz)3@PEG MMM. (D) TEM image of CuSiFg@ 
PEG precursor. (E) HRTEM image of CuSiFs@PEG precursor. (Inset) Red square 
area magnified. (F) HRTEM image of Cu(SiFg)(bpy)2@PEG MMM. (Inset) Red square 


observed in other MOF-based membranes 
with molecular sieving pores (38). To elucidate 
the variety of membrane transport properties 
with MOF content, the separation performance 
of a series of MOF @polymer MMMs was de- 
picted on a permeance-selectivity trade-off plot 
with an H,-CO, separation upper-bound line 
(Fig. 3E and fig. S23). The optimal Cu(SiF,)(pyz) 
3@PEG MMM with 80.4 vol % exhibited an Hy 
permeance of 3640 GPU and H2-CO, selectivity 
of 76.1, which are comparable with those of pure 
MOF membranes. Compared with state-of-the- 
art MOF-based MMMs, our MOF @polymer 
MMMs demonstrated a substantial advantage 
in both permeance and selectivity. 
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Cu(SiF,)(pyz);@PEG 


We endeavored to corroborate the molecular 
transport mechanism through subnanometer 
channels on the basis of density functional 
theory (DFT) calculations. We calculated the 
diffusion properties of H, and CO, gas mol- 
ecules through Cu(SiF,)(pyz)3 using the climb- 
ing image nudged elastic band (CINEB) method 
(39). When a CO, molecule transports through 
the channel, its molecular axis tends to be 
parallel with the axis of the channel. The Hy 
molecule prefers to interact with the F atom of 
the MOF, leading to the tilt of the molecular 
axis relative to the axis of the channel (Fig. 3F 
and fig. S24). As a result, CO. needs to overcome 
a higher energy barrier (7.61 kcal/mol) than does 


22 September 2023 


Cu(SiF,)(pyz)3 @PEG 


im 


{ 
>» 
ay? 
7\ 


area magnified. (G) Ultralow-dose HRTEM image and (inset) selected-area electron 
diffraction pattern of Cu(SiF.)(pyz)3@PEG MMM, acquired along the [011] zone 
axis of the Cu(SiF¢)(pyz)3 crystal. (H) Enlarged HRTEM image of the red square area 
in (G). Two sets of crystal planes are labeled based on the crystal structure of 
Cu(SiF¢)(pyz)3. (I) Structural model of MOF Cu(SiF¢)(pyz)3 projected along the 
[011] direction. Atom colors are gray, carbon; white, hydrogen; blue, nitrogen; 
pink, fluorine; aquamarine, copper; and yellow, silicon. 


H, (6.46 kcal/mol) when transporting through 
the channel. The van der Waals (vdW) surface 
overlap extent between Cu(SiFs)(pyz)3 and CO. 
is larger than that between Cu(SiF.)(pyz)3 and 
Hg, confirming the more difficult transport of 
CO, than Hg. This result, together with the cal- 
culated transition barrier, demonstrates the role 
of the size-sieving effect in the Cu(SiF¢)(pyz)3 
nanochannels for molecular transport. 

The selective permeation of Hz over CO, in 
the nanochannels of the MOF @polymer MMM 
is promising for applications in hydrogen pro- 
duction. At present, only a few membrane 
materials—such as polybenzimidazole (PB, car- 
bon molecular sieves (CMS), and MOFs—can 
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Fig. 3. Nanochannels regulation and membrane transport property and 
mechanism. (A) XRD patterns of CuSiFe, CuSiFe@PEG precursor, Cu(SiF¢)(pyz)s 
simulated, and Cu(SiF¢)(pyz)3@PEG MMM. (B) Positron annihilation 

lifetime spectra of CuSiFg@PVA precursor and Cu(SiF,¢)(pyz)3@PVA MMM. 

(C) Metal salt conversion rate and the corresponding MOF volume 

loading in MOF@PEG MMM with different M,, of PEG (fixed CuSiF,, polymer = 
5:1) and CuSiFe/PEG mass ratios (fixed PEG My of 10,000). (D) Transport 
resistance schematic of Cu(SiF.)(pyz)3@polymer MMM with different 

MOF loadings. (E) Binary H2-CO» separation performance of Cu(SiF.)(pyz)3@ 
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polymer MMMs with different MOF loadings at 25°C and 1.5 bar and comparison 
with that of pure MOF membranes and MOF-based MMMs. Error bars 

indicate the standard deviation from three different samples. The data for those 
points are supplemented by tables S1 to S3 (4, 17-37). (F) (Left) CO2 and 

He diffusion pathways (indicated with arrows) in the MOF structure. To depict 
the positions of CO2 molecules along the diffusion pathway, the oxygen 
atoms of CO2 molecules are shown by the colors green, pink, and ice blue, 
respectively. (Right) The superimposed vdW surfaces of Cu(SiF.¢)(pyz)3 

and COs (red) and Hz (green) molecules. 
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withstand the harsh conditions of syngas pro- 
cessing (such as high temperature). However, 
the relatively low permeability of PBI and the 
processability scarcity of MOF and CMS mem- 
branes may hamper their wider application. 
Our MOF@polymer MMM possessed abundant 
molecular sieving nanochannels, provided very 
good H,-CO, separation performance similar to 
molecular sieve properties, and maintained the 
processability of polymer membranes. Con- 
sidering the working temperature higher than 
100°C, we chose Cu(SiF)(pyz)3@PVA MMM be- 
cause it exhibits a higher glass transition tem- 
perature (7;) than that of Cu(SiFs)(pyz)3@PEG 
MMM and higher thermal stability (Fig. 4A). 
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Fig. 4. H2-CO2 separation performance of MMMs and the universality of SSP 
strategy. (A) 7, of polymer and MOF@polymer MMM. (B) Effect of operating 
temperature on the separation performance of 59.6 vol % Cu(SiFe)(pyz)3@PVA 
MMM at 1.5 bar. (€) Binary H2-CO>2 separation performance of 59.6 vol % 
Cu(SiF¢)(pyz)3@PVA MMMs at 120°C and 1.5 bar, and 64.4 vol % Ni(NbOFs)(pyz)3@ 
PVA MMMs at 120°C and 1 MPa with a 3 vol % water vapor content, and 
comparison with other membranes (tables S4 and S5) (33, 40-48). Error bars 
indicate the standard deviation from three different samples, in which some 
error bars are smaller than the symbols. (D) Binary H2-CO2 separation 


Chen et al., Science 381, 1350-1356 (2023) 


Compared with PEG-based MMM, the uni- 
form MOF nanocrystals observed in PVA-based 
MMM at elevated temperature also confirmed 
the higher stability of Cu(SiF.)(pyz)3@PVA 
MMM (fig. S25). 

We investigated the effect of operating tem- 
perature on the H,-CO, separation performance 
of 59.6 vol % Cu(SiFs)(pyz)3@PVA membrane 
(Fig. 4B). The elevated CO, permeance and re- 
duced H,-CO, selectivity as the temperature 
increased to 120°C could be attributed to the 
more intense molecular motion and weak- 
ened molecular sieving caused by the more 
flexible skeleton. As the temperature decreased 
from 120°C to room temperature, the mem- 
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brane performance was almost unchanged 
compared with the original membrane, with- 
out noticeable aging or coarsening (fig. S26). 
As shown in Fig. 4C, the membrane exhibited 
excellent H2-CO, separation performance, sur- 
passing not only the 100°C upper bound of 
conventional polymeric membranes but also 
the permeance of MOF-based MMMs, advanced 
thermally rearranged (TR) polymers, and the 
benchmark PBI membranes (tables S4 and S5) 
(33, 40-48). The Cu(SiF¢)(pyz)3@PVA MMM 
was comparable with advanced CMS and MOF 
membranes and entered the attractive area in 
H»-CO, separation (Hy permeance, 300 GPU; 
H,-COz, selectivity, 30) (fig. S27 and tables S6 and 
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Membrane location 


performance of Cu(SiF¢)(pyz)3@PEG MMM before and after curling at 25°C and 
1.5 bar. (Inset) Digital photo of the curved membrane. (E) XRD patterns and SEM 
images of ZIF@PEG MMM after 2-methylimidazole vapor-induced conversion 
from Zn(NO3)2@PEG precursor. (F) Separation performance of MOF@PEG MMMs 
with different MOF species at 25°C and 1.5 bar. Error bars indicate the standard 
deviation from three different samples. (G) Digital photo of (left) PAN hollow- 
fiber substrate, (right) Cu(SiFe)(pyz)3@PVA MMM, and its cross-sectional SEM 
images. (H) Digital photo of scaled-up fabricated Cu(SiF¢)(pyz)3@PVA MMM and 
its binary Hz-COz separation performance of 12 portions at 25°C and 1.5 bar. 
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S7) (49). In contrast to earlier feed conditions— 
lack of water vapor and high temperature and 
pressure—we further investigated the H2-CO, 
separation performance under more realistic 
conditions (120°C and 1 MPa with 3 vol % wa- 
ter vapor) using a 64.4 vol % MOF@PVA MMM 
composed of a more water-stable MOF with 
Ni(D-pyrazine coordination units and (NbOF,)” 
pillars (supplementary materials, materials 
and methods). The membrane performance 
(Hz permeance, 1118 GPU; H2/CO, selectivity, 
32.7) was stable and still in the attractive area, 
despite a slight decrease of H, permeance and 
H.-COz selectivity in the presence of water 
vapor (fig. S28), showing great potential for 
precombustion CO, capture and blue hydro- 
gen production. 


Comparison with traditional 
nanoporous membranes 


To understand the essential differences be- 
tween the MOF@polymer MMM and tradi- 
tional nanoporous membranes, we prepared 
pure Cu(SiF,)(pyz)3 crystalline membranes and 
Cu(SiF.)(pyz)3 MMMs as control samples. We 
attempted the commonly used methods, includ- 
ing hydrothermal synthesis and layer-by-layer 
assembly for pure MOF membrane fabrication, 
but failed to obtain integrated membranes. 
This was consistent with the limited reports on 
SIFSIX-family crystalline membranes (fig. S29). 
The challenge of controlling crystal symbiosis 
growth and intercrystalline defects disrupts 
the continuity of the Cu(SiFg)(pyz)3 membrane, 
rendering it unsuitable for gas separation. Ad- 
ditionally, we tried to convert the CuSiF, pre- 
cursor to a MOF membrane through ligand 
vapor treatment without using a polymer as a 
solid solvent (fig. S30). The XRD results con- 
firmed the feasibility of ligand vapor treatment 
for the synthesis of Cu(SiF)(pyz)3 MOF (fig. 
$31). However, the CuSiF, solution cannot 
form an intact precursor layer on the porous 
substrate, resulting in isolated MOF crystals on 
the substrate surface and abundant MOF crys- 
tallization inside the substrate pores (fig. S32). 
The inferior H-CO, separation performance 
suggested the presence of defects. Meanwhile, 
with the SSP strategy, the solid solvent effec- 
tively compensates for the defects between the 
formed MOF particles and provides integrity 
to the membrane. For MOF-dominated MMMs 
within interconnected MOF channels, the poly- 
mer not only addressed the primary challenges 
of pure crystalline membranes but also high- 
lighted the superiority of translating MOF 
materials to molecular-sieving membranes. 
Furthermore, the solid solvent ensures the pro- 
cessibility of the metal salt@polymer solution, 
thus allowing scale-up fabrication of mem- 
branes with an ultrathin selective layer. 

We also synthesized Cu(SiF¢)(pyz)3 nanopar- 
ticles and incorporated them into an H,- 
selective polymer [polysulfone (PSf)] with the 
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standard solution-mixing method. The im- 
proved performance in the Cu(SiF¢)(pyz)3/PSf 
MMMs demonstrated the positive effect of 
Cu(SiF.)(pyz)3 particles (fig. S33). However, 
from 15.2 vol % MOF loading onward, the 
MMM performance decreased, implying inter- 
facial incompatibility between MOF and poly- 
mer (fig. S34). As the MOF content reached 
74.3 vol %, the apparent interfacial voids and 
particle agglomeration greatly compromised 
the membrane integrity (fig. S35), making it 
too brittle to handle for a gas permeation test 
(fig. S36). By contrast, the Cu(SiFg)(pyz)3@PEG 
MMM with 80.4 vol % MOF loading still ex- 
hibited good MOF-polymer compatibility and 
membrane flexibility (Fig. 4D), with no change 
in membrane performance compared with 
the membrane before bending. Scanning elec- 
tron microscopy (SEM) images also suggested 
that no cracks or defects were observed in 
the curled membrane (fig. S37). Compared with 
Cu(SiF¢)(pyz)3/PSf MMM preparation, in which 
volatilization and volume shrinkage of the poly- 
mer solution leads to the mismatch of MOF- 
polymer interface stress, the transition of the 
CuSiFg@polymer precursor to MOF @polymer 
MMM by means of the SSP strategy undergoes 
the phase transition simultaneously, thus avoid- 
ing interfacial defects. The polymer immobi- 
lized the metal salt and prevented the MOF 
particles from agglomerating. Meanwhile, the 
polymer flexibility and metal salt@polymer 
cosolubility allow the polymer to adhere dy- 
namically to the metal salt and its converted 
MOF nanocrystal, leading to an ideal MOF- 
polymer interface. Hence, the MOF@polymer 
MMM prepared with the SSP method not only 
achieved ultrahigh MOF loading but also solved 
the main challenges of particle agglomeration 
and MOF-polymer incompatibility, truly real- 
izing the advantages of combining polymers 
and MOFs for gas separation. 


Universality of the SSP strategy 


The fabrication of MOF@polymer MMMs with 
high MOF loading and superior performance 
confirmed the feasibility of the SSP strategy. 
Both PEG- and PVA-based MMMs showed 
outstanding H,-CO, separation performance 
during several hundred hours of continuous 
operation, indicating excellent chemical and 
structural stability (fig. S38). In addition to 
common polymers, the strategy is also proven 
to be valid for different kinds of MOFs. We 
fabricated zeolitic imidazolate frameworks 
(ZIFs) that exhibited promising performance 
for the processing of important industrial 
mixtures (5), similar to MMMs using the SSP 
strategy. After 2-methylimidazole (mlm) vapor 
treatment, abundant lamellar protrusions ap- 
peared on the membrane surface, and the XRD 
showed that the converted MOF structure was 
highly consistent with ZIF-L, implying the suc- 
cessful preparation of ZIF-L@PEG MMM (Fig. 
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4E and fig. S39). Overall, by changing metal salt 
and ligand vapor, various MOF @polymer mem- 
branes were successfully fabricated, including 
M(SiF¢)(pyz)3@PEG MMMs (M=Ni, Zn, or Co), 
Cu(SiF.)(bpy)2@PEG MMM, Ni(NbOF;)(pyz)3@ 
PVA MMM, and ZIF-L@PEG MMM (fig. S40). 
All of these membranes exhibited promising 
separation performance that is fairly beyond 
the H,-CO, upper bound (Fig. 4F). 

Toward practical gas separation application, 
we also fabricated hollow-fiber MMM. Illus- 
trated in Fig. 4G is the visual transformation 
of the hollow fiber from white to blue after 
undergoing the SSP treatment, which indi- 
cates the successful preparation of the MMM. 
SEM images further confirmed the membrane 
integrity with a uniform membrane thickness 
of ~100 nm (fig. S41). The binary H5-CO, sep- 
aration test confirmed that the hollow-fiber 
MMM exhibited H,-CO, separation perme- _ 
ance comparable with that of the flat-sheet 
MMM (fig. S42). Also, we explored the scale-up 
fabrication of flat-sheet MMMs; the separation 
performance of each part was stable, demon- 
strating the potential of the SSP strategy for 
large-scale applications (Fig. 4H and fig. S43). 


Conclusions 


We proposed a SSP strategy for fabricating ultra- 
thin MMMs with highly loaded MOF nano- 
crystals. Distinct from conventional membranes, 
the polymer serves as a solid solvent, which 
allows for unimpeded gas transport through 
interconnected MOF channels and avoids in- 
tercrystalline defects—a key issue for crystalline 
membranes—leading to high H5-CO, selectivity 
comparable with that of pure MOF membranes. 
Meanwhile, the processibility and cosolubility 
of the metal salt@polymer precursor enable the 
formation of an ultrathin selective layer with an 
ultrapermeable property. The solid solvent in 
this work facilitates the filler dispersion and 
ensures the interfacial compatibility between 
filler and polymer, enabling the MMM to main- 
tain its integrity and flexibility even with high 
filler loading. The matching between the poly- 
mer and MOF in terms of membrane formation 
and transport properties deserves more investi- 
gation. Together with its scalability and univer- 
sality, this strategy not only enables demanding 
highly loaded thin-film nanocomposite mem- 
branes but also paves the way for translating 
nanomaterials into molecular-sieving mem- 
branes and related functional coating. 


REFERENCES AND NOTES 


1. W. J. Koros, C. Zhang, Nat. Mater. 16, 289-297 (2017). 

2. J. Shen, G. Liu, Y. Han, W. Jin, Nat. Rev. Mater. 6, 294-312 
(2021). 

3. H.B. Park, J. Kamcev, L. M. Robeson, M. Elimelech, 

D. Freeman, Science 356, eaab0530 (2017). 

Peng et al., Science 346, 1356-1359 (2014). 

Knebel et al., Science 358, 347-351 (2017). 

Liu et al., Nat. Mater. 22, 769-776 (2023). 

Knebel, J. Caro, Nat. Nanotechnol. 17, 911-923 (2022). 

Knebel et al., Nat. Mater. 19, 1346-1353 (2020). 


ONOAOK 
rrorx<w 


6 of 7 


RESEARCH | RESEARCH ARTICLE 


S. J. Datta et al., Science 376, 1080-1087 (2022). 
. X. Tan et al., Science 378, 1189-1194 (2022). 

M. Sandru et al., Science 376, 90-94 (2022). 

. X. Cui et al., Science 353, 141-144 (2016). 


BRESES 


Chem. 2009, 2329-2337 (2009). 

S. D. Burd et al., J. Am. Chem. Soc. 134, 3663-3666 
(2012). 

5. D. Zhang et al., Science 359, 675-679 (2018). 

6. X. Li et al., J. Am. Chem. Soc. 141, 12021-12028 (2019). 
7. X. Wang et al., Nat. Commun. 8, 14460 (2017). 
8. 
9. 


& 


. F. Zhang et al., Adv. Funct. Mater. 22, 3583-3590 (2012). 
. N. Wang, A. Mundstock, Y. Liu, A. Huang, J. Caro, 

Chem. Eng. Sci. 124, 27-36 (2015). 
20. Z. Kang et al., Energy Environ. Sci. 7, 4053-4060 (2014). 
21. Z. Zhong et al., J. Mater. Chem. A 3, 15715-15722 (2015). 


22. V. M. Aceituno Melgar, H. T. Kwon, J. Kim, J. Membr. Sci. 459, 


190-196 (2014). 
23. Y. Sun et al., Angew. Chem. Int. Ed. 57, 16088-16093 
(2018). 


24. A. Huang, H. Bux, F. Steinbach, J. Caro, Angew. Chem. Int. Ed. 


49, 4958-4961 (2010). 

25. X. Dong et al., J. Mater. Chem. 22, 19222-19227 (2012). 

26. F. Cacho-Bailo et al., Chem. Sci. 8, 325-333 (2017). 

27. P. Suet al., J. Mater. Chem. A 3, 20345-20351 (2015). 

28. Y. Liu, N. Wang, J. Pan, F. Steinbach, J. Caro, J. Am. Chem. 
Soc. 136, 14353-14356 (2014). 

29. S. Zhou, Y. Wei, J. Hou, L. Ding, H. Wang, Chem. Mater. 29, 
7103-7107 (2017). 

30. N. Wang et al., J. Mater. Chem. A 3, 4722-4728 (2015). 

31. J. Sanchez-Lainez et al., J. Mater. Chem. A 3, 6549-6556 
(2015). 

32. S. Park, K. Y. Cho, H. K. Jeong, J. Mater. Chem. A 8, 
11210-11217 (2020). 


Chen et al., Science 381, 1350-1356 (2023) 


K. Uemura, A. Maeda, T. K. Maji, P. Kanoo, H. Kita, Eur. J. Inorg. 


33. B. A. Al-Maythalony et al., ACS Appl. Mater. Interfaces 9, 
33401-33407 (2017). 

34. Y. Zhao et al., Separ. Purif. Tech. 220, 197-205 (2019). 

35. X. Ma, X. Wu, J. Caro, A. Huang, Angew. Chem. Int. Ed. 58, 

6156-16160 (2019). 

36. Z. Kang et al., J. Mater. Chem. A 3, 20801-20810 (2015). 

37. Z. Hu et al., Ind. Eng. Chem. Res. 55, 7933-7940 

(2016). 

38. H. Fan et al., Nat. Commun. 12, 38 (2021). 

39. S. Grimme, S. Ehrlich, L. Goerigk, J. Comput. Chem. 32, 

456-1465 (2011). 

40. J. Sanchez-Lainez et al., Chemistry 24, 11211-11219 (2018). 

4l. E. V. Perez, G. J. D. Kalaw, J. P. Ferraris, K. J. Balkus Jr., 

. H. Musselman, J. Membr. Sci. 530, 201-212 (2017). 

42. Y.S. Do, J. G. Seong, S. Kim, J. G. Lee, Y. M. Lee, J. Membr. Sci. 

446, 294-302 (2013). 

43. D. R. Pesiri, B. Jorgensen, R. C. Dye, J. Membr. Sci. 218, 11-18 
(2003). 

44. T. Yang, Y. Xiao, T. S. Chung, Energy Environ. Sci. 4, 4171-4180 
(2011). 

45. J. Sanchez-Lainez et al., J. Membr. Sci. 515, 45-53 (2016). 

46. L. Zhu, M. T. Swihart, H. Lin, Energy Environ. Sci. 11, 94-100 
(2018). 

47. M. Shan et al., Sci. Adv. 4, eaaul698 (2018). 

48. J. Sanchez-Lainez et al., Adv. Mater. Interfaces 5, 1800647 
(2018). 

49. L. Hu, S. Pal, H. Nguyen, V. Bui, H. Lin, J. Polym. Sci. 58, 
2467-2481 (2020). 


ACKNOWLEDGMENTS 


We thank X. Ren (School of Chemistry and Molecular Engineering, 
Nanjing Tech University) for helpful discussions. We are grateful 
to the High-Performance Computing Center of Nanjing Tech 
University for supporting the computational resources. Funding: 


22 September 2023 


W.J. acknowledges funding from the Ministry of Science and 
Technology of the People’s Republic of China (2022YFB3804800), 
the National Natural Science Foundation of China (22038006, 
21921006), and the Priority Academic Program Development of 
Jiangsu Higher Education Institutions (PAPD). Go.L. acknowledges 
funding from the National Natural Science Foundation of China 
(22278210) and the Natural Science Foundation of Jiangsu 
Province (BK20220002). Author contributions: W.J., Go.L., and 
G.C. conceived the idea. W.J., Go.L., and G.C. designed 

the experiments, analyzed the data, and wrote the manuscript. 
G.C. synthesized and characterized the membranes. Y.G. 
conducted the DFT simulations. Z.C., Y.P., and Gu.L. analyzed the 
membrane data. C.C. and Y.H. collected and analyzed the low- 
dose HRTEM data. G.C., Go.L., Y.H., W.J., and N.X. discussed 

the results and commented on the manuscript. Competing 
interests: A US patent (application number 18/165235), with 
W.J., Go.L., and G.C. as co-inventors, has been filed by Nanjing 
Tech University. Data and materials availability: All data needed 
to evaluate the conclusions in the paper are presented in the 
paper and/or the supplementary materials. License information: 
Copyright © 2023 the authors, some rights reserved; exclusive 
licensee American Association for the Advancement of Science. No 
claim to original US government works. https://www.science.org/ 
about/science-licenses-journal-article-reuse 


SUPPLEMENTARY MATERIALS 


science.org/doi/10.1126/science.adil545 
Materials and Methods 

Figs. Sl to S43 

Tables S1 to S7 

References (50-64) 


Submitted 6 April 2023; accepted 31 July 2023 
10.1126/science.adil545 


7 of 7 


RESEARCH 


HOST-GUEST CHEMISTRY 


6 


induces a large conformational change a Chec 


Disequilibrating azobenzenes by visible-light 
sensitization under confinement 
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Photoisomerization of azobenzenes from their stable E isomer to the metastable Z state is the basis of 
numerous applications of these molecules. However, this reaction typically requires ultraviolet light, 
which limits applicability. In this study, we introduce disequilibration by sensitization under confinement 
(DESC), a supramolecular approach to induce the E-to-Z isomerization by using light of a desired 
color, including red. DESC relies on a combination of a macrocyclic host and a photosensitizer, which 
act together to selectively bind and sensitize E-azobenzenes for isomerization. The Z isomer lacks 
strong affinity for and is expelled from the host, which can then convert additional F-azobenzenes to 
the Z state. In this way, the host-photosensitizer complex converts photon energy into chemical 
energy in the form of out-of-equilibrium photostationary states, including ones that cannot be 


accessed through direct photoexcitation. 


zobenzene and its derivatives are arguably 
the simplest and most widely studied 
photoswitchable compounds (J-3). Upon 
exposure to ultraviolet (UV) light, the pla- 
nar (4), nonpolar F isomer of azobenzene 
isomerizes to the metastable Z form (Fig. 1A), 
which is nonplanar and substantially more 
polar. The Z—£E back-isomerization occurs 
spontaneously and can be accelerated with 
visible (blue) light. Owing to the highly re- 
versible nature of E2Z photoswitching, azo- 
benzenes and other azoarenes (5) have found 
applications in energy storage systems (6, 7), 
switchable catalysis (8, 9), controlled release 
(0, 11), and photopharmacology (72, 13), to 
name but a few (4, 15). However, the necessity 
to rely on UV light to generate the metastable 
Z isomer has severely limited the applicability 
of these compounds. Shifting H-azobenzene’s 
absorption band to the visible range can be 
achieved by decorating it with various sub- 
stituents (16, 17), but this approach requires 
additional synthetic effort and affects the com- 
pound’s identity. 
Natural systems evolved an alternative, supra- 
molecular strategy to extend the absorption 
spectral range of photoswitchable molecules 
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(18, 19). For example, deep-sea fishes install a 
chlorophyll antenna next to the opsin-bound 
retinal (20, 21). This antenna captures red light 
and sensitizes the nearby retinal by means of a 


surrounding opsin protein, ultimately enal buseataad 
the fish to detect red light (23). 

Similar to retinal, azobenzene can be switched 
through TET (24-26). Unfortunately, this process 
has long (27, 28) been known to unidirectionally 
convert E—Z mixtures to the thermodynamically 
stable EF isomer. This directionality originates 
from i) the higher tendency of Z (over £) to act 
as a triplet-energy acceptor (29) and ii) the 
preferential [by a factor of >50 (28)] relaxa- 
tion of the triplet excited state of azobenzene 
to E over Z. Therefore, whereas various photo- 
sensitizers can rapidly and efficiently facilitate 
the equilibration of the high-energy Z isomer 
into the stable E state, the reverse reaction— 
i.e., sensitized disequilibration—is far more 
challenging and has remained elusive. 


The concept of disequilibration 
by sensitization under confinement 


We hypothesized that sensitized disequilibration 
might be achieved by using a photosensitizer 
(PS) that acts on the EF isomer of azobenzene 
with high selectivity (Fig. 1B). We have previously 
shown that (i) the water-soluble, palladium- 
containing macrocyclic host H (30) (Fig. 1C) 
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binds two molecules of various E-azoarenes 
(which are planar and readily stack on top of 


triplet-energy transfer (TET) mechanism (22). 
The subsequent photoisomerization of retinal 
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Fig. 1. Disequilibration by sensitization under confinement (DESC). (A) The transformation of the stable 
E isomer of azobenzene to the metastable Z isomer traditionally relies on the use of ultraviolet 

(A = 350 nm) light. (B) The mechanism of DESC is as follows: (i) formation of the ternary inclusion complex 
(E-PS)cH (PS, photosensitizer; H, host); (ii) absorption of a photon of visible light by the PS followed by 
intersystem crossing (ISC); (iii) triplet-energy transfer (TET) and the formation of triplet azobenzene, followed by 
its relaxation (iv) to Z-azobenzene or (iv') back to E-azobenzene (corresponding to internal conversion); and (v) 
disassembly of the unstable (Z-PS)cH inclusion complex. (©) Components of the supramolecular system used for 
DESC include macrocyclic host H coassembled from six Pd** ions and four triimidazole ligands, and a 
photosensitizer (e.g., BODIPY ps1). (D) Structural formulae of azoarenes 1 to 9 investigated in this study. 
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each other to form noncovalent homodimers), 
but only one molecule in the Z configuration 
(31, 32) [because of its nonplanar (33) geom- 
etry]; Gi) host H can also encapsulate—and 
thus, induce noncovalent dimerization of—guests 
structurally similar to E-azobenzene (i.e., planar 
aromatic molecules), including various dyes 
(34, 35); and (iii) mixing two different inclusion 
complexes (each binding two molecules of a 
given guest) induces a rapid guest exchange 
between the hosts, affording heterodimeric 
complexes, whereby the host encapsulates two 
different guest molecules (36). Taken together, 
we speculated that host H could coencapsu- 
late the EF isomer of azobenzene and a PS (thus 
bringing them in close proximity) while pro- 
hibiting close encounters of the same PS with 
the Z-azobenzene (which is bound as a sole 
guest). We call this approach disequilibration 
by sensitization under confinement (DESC). 

The concept of DESC is illustrated in Fig. 1B. 
The addition of an encapsulated PS [i.e., (PS)»>cH] 
to E-azobenzene induces the formation of a 
ternary complex (E-PS)cH (step i). Upon expo- 
sure to visible light, PS is promoted to a singlet 
excited state, which relaxes to a triplet state 
through intersystem crossing (ISC) (step ii). In 
step iii, PS transfers its triplet energy to the co- 
confined E-azobenzene. The resulting triplet 
azobenzene—which cannot be generated by di- 
rect photoexcitation—can either decay to the 
initial F isomer (step iv’) or transform into the 
Z state (step iv). The former case regenerates 
(E-PS)cH, which can be reexcited. By contrast, 
the latter case results in (Z-PS)cH, which is an 
unstable complex because Z-azobenzene is too 
bulky to coexist with the PS inside H. At the 
same time, Z as a sole guest is bound relatively 
weakly (fig. S103); hence, it is expelled from 
the host and effectively removed from the equi- 
librium. Thus, the azobenzene-free inclusion 
complex of the PS is regenerated (step v) and 
available for transforming additional mole- 
cules of E- into Z-azobenzene. 

To verify our hypothesis, we initially focused 
on the parent azobenzene E-1 and the proto- 
typical boron-dipyrromethene (BODIPY) dye 
psi (Fig. 1, C and D). Both £-1 and ps1 form 
homodimers within H’s cavity, as previously 
elucidated by several techniques including 
x-ray diffraction, nuclear magnetic resonance 
(NMR), and UV-visible (vis) absorption spec- 
troscopy (31, 32, 34). Figure 2B shows the UV- 
vis spectrum (dotted brown line) obtained after 
mixing aqueous solutions of the two homo- 
dimers, (Z-1),.cH and (ps1).cH, in a 1:1 molar 
ratio. The absorption profile in the visible range 
is practically identical to that of pure (ps1),.cH 
(blue dotted line), indicating a minute fraction 
of the (Z-1-ps1)cH heterodimer (i.e., the equi- 
librium shown in Fig. 2A heavily favors the 
two homodimers). However, exposing this so- 
lution to low-intensity green light (wave- 
length at maximum intensity, Amax = 525 nm; 
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2.5 mW cm”) resulted in a substantial (by 
~35%) decrease of absorption in the near-UV 
region (Fig. 2B), indicating the E—Z isom- 
erization of 1. This result suggests that the 
small amount of (£-1-ps1)cH in equilibrium 
with the homodimers absorbs green light, the 
energy of which is eventually used to generate 
a large amount of the metastable Z isomer (Fig. 
1B). The low illumination intensities used in 
our studies exclude the possibility of two-photon 
isomerization (37, 38), which we confirmed di- 
rectly through power-dependence experiments 
(fig. S108). 

To determine the scope of DESC, we extended 
our studies to a diverse portfolio of azobenzenes 
and other azoarenes, including derivatives 
with charged groups and electron-donating 
and -withdrawing substituents (Fig. 1D, 2 to 
9). All of these compounds were encapsulated as 
homodimers within host H, which was con- 
firmed by NMR spectroscopy (supplementary 
materials). Similar to 1, most of these guests 
preferably existed as E.¢H homodimers even 
in the presence of excess (ps1).cH. However, 
azobispyrazole 9 (39) [and, to some extent, 
azopyrazole 8 (40)] showed a strong tendency 
to form a heterodimer with ps1, as manifested 
by the intense 509-nm peak in the absorption 
spectrum (Fig. 2C, dotted brown line). The 
high fraction of the heterodimer allowed us 
to grow single crystals and determine the struc- 
ture by x-ray diffraction, revealing E-9 and ps1 
bound tightly inside the cavity of the host (Fig. 
2D). Exposure of (Z-9-ps1)cH to 525-nm light 
quenched its near-UV absorption, consistent 
with the E-Z isomerization (Fig. 2C). The 
putative (Z-9-ps1)cH heterodimer is unstable, 
forcing ps1 into homodimers, which explains 
why the 400- to 600-nm portion of the spec- 
trum at the end of the reaction is nearly iden- 
tical to that of pure (ps1).cH (Fig. 2C). Similar to 
1 and 9, compounds 2 through 8 also switched 
to their Z isomers when exposed to 525-nm 
light in the presence of (ps1).cH (fig. S79). 

The vastly different heterodimer populations 
in 1+ psi versus 9 + ps1 mixtures do not trans- 
late into major differences in the reaction kinet- 
ics: The former comprises only ~2% heterodimer 
but requires only twice as much time as the 
latter (which has a heterodimer fraction of 
~80%) to reach a photostationary state (PSS). 
This finding reflects the rapid guest-exchange 
kinetics between hosts (36), which led us to 
hypothesize that DESC should work efficiently 
also with catalytic amounts of the PS. Indeed, 
decreasing the amount of (psl1).cH to only 
0.05 equiv. with respect to (E-9).cH extended 
the time required to reach the PSS fourfold 
(Fig. 2E), but did not markedly affect its 
composition. 

The finding that E-9 and ps1 form a het- 
erodimer in a near-quantitative yield allowed us 
to determine the quantum yield (QY) of DESC 
for this pair. Here, we note that ps1 within 
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(E-9-ps1)cH is highly emissive, but its fluores- 
cence in (psl).cH is largely quenched (34). 
Therefore, exposing (E-9-ps1)cH to a 515-nm 
pulsed laser led to a gradual decrease of emis- 
sion (Fig. 2F, empty markers). When the ex- 
periment was repeated in the presence of an 
extra 2 and 4 equiv. of (E-9).cH, however, we 
observed lag periods of ~8 and ~18 min, re- 
spectively. The stable fluorescence, despite the 
ongoing E—Z isomerization of 9, indicates that 
the concentration of (Z-9-ps1)cH remains 
steady, which confirms the rapid exchange 
kinetics in our system: as soon as the isom- 
erized Z-9 is expelled from the host, it is re- 
placed by another copy of E£-9 (if available). 
By assessing the mean number of absorbed 
photons required to convert the excess of E-9, 
we found that each successful EZ isomeriza- 
tion event requires 17 photons on average (de- 
rivation can be found in the supplementary | 
materials), which corresponds to a QY of ~6%, 
a notably high value, given the number of steps 
separating the excitation of ps1 from the for- 
mation of Z-9 (Fig. 1B). 

Once generated by DESC, the Z isomer can 
be back-isomerized to E through direct excita- 
tion with blue light (435 nm), and the process 
can be repeated for many cycles. To demon- 
strate the robustness of DESC, we subjected 
compound 9 to >100 switching cycles and did 
not observe any noticeable fatigue: both 9 and 
psi retained their initial absorbance values 
(Fig. 2G). 


Time-resolved spectroscopic and 
computational studies of DESC 


In solution, BODIPY dyes as simple as ps1 are 
poor triplet sensitizers (41, 42); therefore, the 
finding that ps1 acts as an efficient photo- 
sensitizer in DESC is unexpected. To obtain 
mechanistic insights into DESC, we performed 
transient absorption spectroscopy (TAS) and 
computational studies (Fig. 3). First, we studied 
the photoinduced dynamics of the (ps1),.¢cH 
homodimer using femtosecond TAS (fs-TAS). 
Figure 3A shows the fs absorption changes at 
two different wavelengths following excitation 
of (ps1).cH with a 500-nm laser. The initial 
(<1 ps) ground-state bleach at 483 nm accom- 
panied by excited-state absorption at 412 nm 
is indicative of the transition from the ground 
state (Sg) to the singlet excited state (S)) in ps1. 
At delay times >100 ps, the 412-nm absorp- 
tion increases further and the 483-nm bleach 
becomes more pronounced, which can be at- 
tributed (43) to ISC from the S, state to the 
triplet excited state (T,). Using microsecond 
TAS (us-TAS), we found the resulting triplet 
state to be remarkably stable, with a mono- 
exponential lifetime of 16.5 + 0.5 us under 
ambient conditions (fig. S114A). As expected 
from a triplet state, the lifetime is strongly de- 
pendent on the amount of oxygen in the sol- 
vent; decreasing the amount of O. by bubbling 
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Fig. 2. Following DESC by steady-state absorption and emission spectros- 
copy. (A) Equilibrium between (top) homodimeric inclusion complexes of 

a PS and an E-azoarene—(PS)ocH and E>cH, respectively—and (bottom) the 
heterodimeric complex (E-PS)cH. Only E residing within the heterodimer, but not 
the homodimer, can be switched with visible light. The resulting Z is encapsulated 
as a sole guest (if enough host is available) and cannot be sensitized either. 
(B) Absorption spectrum of a 1:1 mixture of (psl)2cH and (F-1)2cH (dotted 
brown line) and changes in the spectra accompanying irradiation with green light 
(Amax = 925 nm, denoted by green shading). The dotted blue line represents the 
absorption spectrum of (psl)2cH. (C) Absorption spectrum of a 1:1 mixture of 
(psl)2cH and (E-9)scH [dotted brown line; predominantly (E-9-ps1)cH] and 
changes in the spectra accompanying irradiation with green light (Amax = 529 nm, 
denoted by green shading). The dotted blue line represents the absorption 


N, for 4 min and 10 min extended the lifetime 
of the T, state of ps1 to 160 + 4 us and 10.1 + 
0.5 ms, respectively (fig. S114B). 

When the fs-TAS experiment was repeated 
for a 1:2 mixture of (ps1).cH and (E-9).cH (i.e., 
a pair with a high tendency to form a hetero- 
dimer), the bleach at 483 nm was substantially 
less pronounced (Fig. 3B, inset), indicating a 
TET to E-9 (Fig. 1B, step iii). Notably, the TET 
and the subsequent formation of Z-9 occur in 
the nanosecond time regime—i.e., much faster 
than the lifetime of the ps1 triplet state—which 
explains why DESC does not require exclusion 
of oxygen. In fact, we found the process to be 
equally efficient in strictly deoxygenated ver- 


sus thoroughly oxygenated water (fig. S91). 
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We also studied the E-9-ps1 pair under 
ambient conditions by us-TAS (Fig. 3C) and 
found the intensity of transient absorption at 
430 nm within 0.1 us after excitation (AAbs*430) 
to be inversely proportional to the amount of 
£E-9. This finding was consistent with the quench- 
ing of psI’s triplet state by its E-9 co-guest by 
means of TET. The resulting triplet-9 can either 
relax to the initial Z-9 isomer or switch to Z-9, 
which absorbs at 430 nm, hence the increasing 
steady-state absorption (AAbs" 439). This intimate 
relationship between the degree of ps1 triplet 
state quenching and the extent of EZ isom- 
erization identified by us-TAS further confirms 
that DESC proceeds by means of TET between 


spectrum of (ps1)2cH. (D) The x-ray crystal structure of the heterodimeric 
complex (E-9-psl)cH (from the left: front view, side view, and top view; light gray, 
host H; dark gray, E-9; green, psl; water molecules, counterions, and host 
protons omitted for clarity). (E) Graph following DESC of 9 in the presence of 
different equiv. of psl (the data were normalized to the O to 1 range, except the 
experiment with no PS; raw spectra can be found in fig. S76). Abs., absorbance. 


ion intensity of ps1 under 515-nm light (used both 


to induce DESC and excite fluorescence) as a function of the amount of 9. Em. 


ore than 100 cycles of reversible photoisomerization of 
ight (EZ, DESC with 525-nm light for 2 min; ZE, 
g 435-nm light for 30 s). The amount of the E isomer 
nce at 353 nm; the absorption at 480 nm originates 
imer. norm., normalized. 


To gain further insights into DESC, we studied 
various azoarene-PS combinations as nonco- 
valent heterodimers by using quantum chem- 
ical simulations. We consistently found that 
the lowest-energy triplet state within these 
heterodimers was localized on the PS (there- 
fore, we refer to it as Tps) and the second-lowest 
triplet state was localized on the azoarene (i.e., 
Tazo), Which indicates that the Tps—>T,,, tran- 
sition is an endothermic process (25, 44) (e.g., 
Fig. 3D shows the energy diagram for 1-psl. 
These results led us to hypothesize that the 
experimentally observed TET might be facili- 
tated by thermal fluctuations of molecules, as 
suggested previously for other triplet donor- 


the ps1 donor and £-9 acceptor. 
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acceptor pairs (25, 26, 45, 46). Therefore, we 
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Fig. 3. Time-resolved spectroscopic and computational studies of DESC. 
(A) Normalized decays of fs transient absorption of ps1 at 412 nm and 483 nm 
within the (ps1)2cH homodimer (excitation wavelength, %¢x¢ = 500 nm). Points, 
raw data; lines, four-exponential fits (details can be found in the supplementary 
materials). norm., normalized. (B) Normalized transient-absorption decays of ps1 
at 483 nM (Agye = 500 nm) in (psl)ocH versus the (E-9-ps1)cH heterodimer 
(linear scale in the O to 1 ps time range; logarithmic scale beyond 1 ps). (Inset) 
The same data plotted on the linear scale. (©) Decays of us transient absorption 
at 430 nm (Aexe = 510 nm) in (ps1)scH in the presence of increasing amounts 
of (E-9)ocH. The thin and thick lines correspond to experimental data and 
biexponential fits, respectively. (Inset) The inverse correlation between AAbs*439 
(absorbance at 430 nm immediately after photoexcitation) and AAbs”430 
(steady-state absorbance after photoswitching). All the TAS results presented 
here were collected under ambient (nondeoxygenated) conditions. (D) The 
calculated energies of the F-1-ps1 heterodimer's lowest excited states: the bright 
singlet state (Spi) and the two lowest triplet states, localized on psi and 


A® = +165° 


E-1 (Tpsi and Te, respectively) (details can be found in fig. S116). The arrows 
indicate the sequence of events (“hv” indicates a photoinduced transition; the 
gray arrow indicates an endothermic process). (E) Ground-state relaxed scan 
along the C-N=N-C dihedral angle ® in 1 within the 1-ps1 heterodimer. The 
gray line denotes the ground state (So), the red line denotes the Spsi state, 
and the blue lines denote the Tey and Tpgi states. The circular and triangular 
markers correspond to the localization of the excited state on the donor (psi) 

and acceptor (E-1), respectively. (F) Ground-state relaxed scan of ® in 1 within 
the (1-ps1)cH heterodimer (orange trace). The blue trace shows the energies 
that correspond to the same configurations of 1 and psi after removing the host 
and its interactions. (i), (ii), and (iii) correspond to A® values 0° +165° and 
-170° respectively. (G) Optimized geometries of (1-ps1)cH for the three Ad 
values indicated in (F) (left, side views; right, top views). The distances between 
the indicated equatorial Pd nodes describe the degree of host deformation; 

the larger the difference between the two Pd-Pd distances, the greater the transition 
of H from a tube-like conformation into a bowl-like conformation. 


studied the dependence of the 1-ps1 excited- 
state energies on the C-N=N-C dihedral angle 
(®) in azobenzene 1 (which is substantially 
more flexible than ps1). Figure 3E shows a 
relaxed scan for the 1-ps1 heterodimer, demon- 
strating that an 18° twist in ® is sufficient to 
invert the energetic order of Tp.1 and Ty, 
making TET energetically favorable. We sep- 
arately studied the dynamics of the (1-ps1)cH 
heterodimer by multiscale molecular dynamics 
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simulations (fig. S121 and movie S1). These 
simulations revealed that thermal fluctuations 
readily allow 1 to adopt conformations with A® = 
18° at room temperature. 

We performed additional multiscale simu- 
lations to better understand the relative insta- 
bility of the (Z-PS)cH heterodimers versus 
(E-PS)cH (Fig. 1B), which lies at the heart of 
efficient DESC. The starting point of the sim- 
ulations was (E-1-ps1)cH with a perfectly planar 
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geometry of E-1. We performed a relaxed scan by 
changing ® in steps of 5° in both senses of 
rotation; the resulting energies are plotted in 
Fig. 3F in orange. The blue curves in Fig. 3F cor- 
respond to the same geometries while neglecting 
the host and its interactions; therefore, the 
energetic difference between the two curves 
(highlighted with gray shading) quantifies the 
instability of the inclusion complex. We found 
that rotating ® in one direction affords a highly 
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unstable supramolecular architecture (Fig. 3G, ii) 
that is ~0.35 eV (~8 kcal/mol) higher in energy 
than free Z-1-ps1 (Fig. 3F). Notably, rotating ® in 
the opposite direction gave rise to a geometry 
where the host did not markedly increase the 
energy (Fig. 3, F and G, iii). However, in this struc- 
ture, host H assumes a bowl-like conformation, 
and Z-1 extrudes from the cavity, facilitating its 
expulsion to the solution (movies S5 and S6). 
The high conformational flexibility of H (47) is 
an important requirement for DESC; indeed, the 
cavities of rigid coordination cages (48) and other 
confined environments (49, 50) were shown to 
render azobenzene nonphotoswitchable under 
all wavelengths of light. 


Tuning the excitation wavelength of DESC 


Encouraged by the unexpected sensitization 
potency of ps1 under confinement, we consid- 
ered DESC with other, more red-shifted dyes, 


including ones not previously known to act as 
triplet sensitizers. To this end, we first focused 
on the fluorinated BODIPY ps2 (Fig. 4A), with 
an absorption peak centered at 553 nm (51) 
(compared with 499 nm for ps1). We found 
that ps2 exhibited a higher affinity than ps1 to 
form heterodimers with various azoarenes and 
hypothesized that the increased PS-azoarene 
interactions should further promote DESC. 
Indeed, Fig. 4B shows that ps2 induces a near- 
quantitative E—Z conversion of an equimo- 
lar amount of azobenzene 4 within only 90 s 
of low-intensity (2.5 mW cm”) yellow-light 
(561 nm) irradiation. The more efficient DESC 
allowed us to decrease the PS loading further: 
At only 0.01 equiv. of (ps2).cH with respect to 
(E-4)).cH, the PSS was reached within ~20 min 
(Fig. 4). 

In general, ps2 is a more efficient DESC agent 
than ps1. However, we found one exception: 
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Fig. 4. Extending the concept of DESC to red-shifted photosensitizers. (A) Structural formulae of 
fluorinated BODIPY ps2, resorufin ps3, and resazurin ps4. (B) Changes in the absorption spectra of encapsulated 
E-4 in the presence of an equimolar amount of encapsulated sensitizer ps2 under yellow light (Amax = 561 nm; 
2.5 mW cm”). (C) DESC (here, for E-4) in the presence of substoichiometric amounts of ps2. (D) Changes 

in the absorption spectra of encapsulated E-1 in the presence of an equimolar quantity of encapsulated 
sensitizer ps3 under orange light (Ama, = 599 nm; 0.8 mW cm’). (E) DESC (here, for E-1) in the presence of 
substoichiometric amounts of ps3. (F) Changes in the absorption spectra of encapsulated E-2 in the presence 
of an equimolar quantity of encapsulated sensitizer ps4 under red light (Ama, = 635 nm; 3.4 mW cm’). (G) DESC 
(here, for F-2) in the presence of substoichiometric amounts of ps4. The data in (C), (E), and (G) were normalized 
to the 0 to 1 range, except for the experiments with no PS; raw data can be seen in figs. S85G, S93F, 


and S98G, respectively. norm., normalized. 
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ps2 proved unable to induce the switching 
of azobispyrazole E-9. To understand this re- 
sult, we resorted to quantum chemical simu- 
lations and found the Tps-T,,. energy gap for 
the E-9-ps2 heterodimer to be exceptionally 
high (1.03 eV; compared with 0.26 eV for E-1-ps1 
in Fig. 3D). Relaxed scans analogous to those 
in Fig. 3E showed that ® in 9 must twist by 
38°—a prohibitively large distortion—for the 
energies of these two triplet states to equalize 
(fig. $122D). These computational results not 
only rationalize the experimental findings but 
also provide further (although indirect) sup- 
port for the involvement of the TET mechanism 
in our system. 

We also worked with resorufin ps3 and res- 
azurin ps4 (Fig. 4A), both of which were pre- 
viously reported to form inclusion complexes 
of the (PS)2cH type (35). These two dyes are 
red-shifted even further than is ps2; for ex- 
ample, the absorption maxima of their respec- 
tive heterodimers with E-1 appear at 587 and 
616 nm, with absorption extending into the 
red spectral range. To our satisfaction, exciting 
the absorption bands on these heterodimers 
with orange and red light, respectively, resulted 


in a highly efficient EZ isomerization of ‘ 


nearly all azoarene-PS combinations (Fig. 4 
and figs. S95 and S100). 

The performance of DESC is showcased in 
Fig. 5A, which lists the PSS compositions (blue 
font) for all the nine model azoarenes shown 
in Fig. 1D (encapsulated within H in water 
with 0.05 equiv. of the selected sensitizer: ps2 
for 1 to 7 and ps1 for 8 and 9). The reactions 
were performed on the NMR scale (i.e., mil- 
ligram quantities of 1 to 9) and can readily be 
scaled up to obtain the Z isomers on the pre- 
parative scale (tens of milligrams). As control 
experiments (red font), we irradiated 1 to 9 
under the same conditions and in the pres- 
ence of the PS, but without host H (hence, in 
an organic solvent). In the absence of H, the E 
isomers could not be co-confined with the PS, 
which resulted in negligible amounts of Z- 
isomer formation by direct photoexcitation 
[only azobenzene 6, known for its visible- 
light-responsiveness (17), afforded a small 
(14%) amount of Z]. The positively charged 
azobenzene (52) 5 [often recognized as a proto- 
typical photopharmacophore (13, 53)] showed 
a particularly impressive contrast in behavior 
between the presence and absence of the host, 
giving rise to 98% of Z. Notably, such a Z-rich 
PSS cannot be achieved by direct photoisom- 
erization [of neither (Z-5).cH nor free 5] with 
any wavelength of light (the same is true for 
compounds I, 3, and 6) because of the partial 
overlap of the absorption bands of the two 
isomers (54). By contrast, the PSS composition 
in DESC is dictated by the tendency of the two 
isomers to form the ternary (azo-PS)cH com- 
plex, and this tendency is overwhelmingly higher 
for the F isomer. 
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Fig. 5. Performance and selectivity of DESC. (A) Numbers in blue represent 
the composition of the PSS of azoarenes 1 to 9 subjected to DESC on the 
millimolar scale in the presence of 0.05 equiv. of the PS (ps2 under 561-nm light 
for 1 to 7; ps1 under 525-nm light for 8 and 9). Numbers in red represent the 
PSS compositions of the same azoarene-PS mixtures under identical illumination 
conditions, but in the absence of the host (CDCl3 was used as the solvent for 
all azoarenes except 3 and 5, for which CD30D was used). Illumination times 
were 15, 12, 35, 12, 40, 24, 45, 4, and 9 min for 1 to 9, respectively. n.d., not 
detected (i.e., the amount of Z was below the NMR detection limit). (B) Evolution 
of absorption spectra during red-light irradiation of E-3 in water in the 


Having demonstrated that the positively 
charged E-5 can be successfully transformed 
into Z-5 despite its low affinity to the like- 
charged H, we speculated that other water- 
soluble azobenzenes may also be efficiently 
disequilibrated with a substoichiometric amount 
of not only the PS but also the host. Figure 5B 
shows the result of an experiment in which an 
aqueous solution of the negatively charged 3 
was exposed to red light in the presence of 
0.005 equiv. of (ps4).cH. The absorption spec- 
trum of this solution is dominated by the intense 
absorption peak of E-3 in the near-UV region; 
the small amount of the sensitizer appears as a 
weak band at ~600 nm (Fig. 5B). Remarkably, 
exciting this band with low-intensity 635-nm 
light resulted in a near-complete disappearance 
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of the much more prominent and distant peak 
originating from another species (-3). We found 
that the PSS contained 88% of Z-3 (versus ~0% 
in the absence of either H or ps4), which indi- 
cated that each molecule of H hosted more than 
180 EZ isomerization events on average. 


Photoswitching selectivity enabled by DESC 


To further demonstrate the potential of DESC, 
we explored the charge (+12) and cavity size of 
host H to discriminate between photoreac- 
tive compounds with overlapping absorption 
bands, which otherwise cannot be converted 
selectively. To this end, we mixed E-3 and E-5 
in a 1:3 ratio and added (ps2).cH (0.5 equiv. 
with respect to 3) (Fig. 5C). At low (micromo- 
lar) concentrations, only the negatively charged 
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presence of 0.005 equiv. of (ps4)zcH. The resulting PSS contains 88% Z-3. 
(C) Selective switching of the negatively charged E-3 with 561-nm light in the 
presence of the positively charged E-5 on the micromolar scale. (D) UV-vis 
absorption spectra for the experiment shown in (C). Dotted line, PSS under 
561-nm light; dashed line, PSS after the subsequent exposure to 365-nm light 
(where both azobenzenes isomerize to a similar extent). (E) Selective switching 
of E-4 with 561-nm light and (ps2)2cH in the presence of a UV-dimerizable 
anthracene. (F) UV-vis absorption spectra for the experiment shown in (E). 
Dotted line, PSS under 561-nm light; dashed line, after the subsequent exposure 


3 exhibits affinity to H; 5 is not encapsulated 
owing to the Coulombic repulsion. Indeed, 
yellow-light (561 nm) illumination of this mix- 
ture led to a highly selective switching of E-3 
(despite the threefold excess of E-5) (Fig. 5D 
and fig. S104). By contrast, exposure to UV light 
induced nonselective isomerization of both 
azobenzenes by direct excitation. In the sec- 
ond example, we worked with a mixture of E-4 
and 9-bromoanthracene, both in the form of 
homodimers encapsulated within H. Upon 
exposure to UV light, the encapsulated an- 
thracene rapidly dimerizes to afford the cor- 
responding dianthracene (36) under the same 
irradiation conditions that triggered the direct 
E-Z photoisomerization of 4 (Fig. 5E, dashed 
arrow). However, exposing the same mixture 
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to yellow light in the presence of (ps2),cH 
induced highly selective photoisomerization of 
azobenzene, leaving the anthracene intact (Fig. 
5F, dotted line). 


Discussion 


From a thermodynamic perspective, our sys- 
tem acts as a light-driven supramolecular ma- 
chine that converts light into chemical energy 
in the form of out-of-equilibrium photosta- 
tionary states. DESC relies on the selective 
coencapsulation of the stable E isomer of an 
azobenzene with a dye that acts as an antenna, 
absorbing visible light energy that is ultimately 
used to generate the metastable Z isomer. The 
absorption of light promotes the dye from the 
ground state to the singlet excited state. Con- 
finement inside of the host increases the dye’s 
ability to undergo intersystem crossing, pop- 
ulating the dye’s triplet state and turning it 
into a potential triplet sensitizer. Quantum 
chemical simulations reveal that although the 
triplet state of azobenzene is higher than that 
of the dye, a small dihedral angle twist in 
azobenzene lowers its triplet energy while in- 
creasing the triplet energy of the coencapsu- 
lated dye to the extent that the two energy levels 
converge. Therefore, the dye-to-azobenzene 
triplet-energy transfer can become favorable 
owing to azobenzene dynamics (25, 44). Once 
in the triplet state, the azobenzene can either 
dissipate energy or switch to the Z isomer. 
Z-azobenzene is nonplanar and can no longer 
be co-confined with the photosensitizer; thus, 
it is expelled from the host and cannot be re- 
sensitized. In this way, DESC shifts the equi- 
librium toward the metastable Z state without 
the need to populate azobenzene’s singlet ex- 
cited state, which is relatively high in energy 
and requires the absorption of UV light. 
Although we focused on a particular host 
and one class of photoswitchable molecules 
(H and azoarenes, respectively), our results 
allow us to establish general design principles 
for other DESC systems: (i) The host should have 
an affinity for a photoswitch and a photosen- 
sitizer, and its cavity should be large enough to 
simultaneously encapsulate the photosensitizer 
and the thermodynamically stable isomer of 
the photoswitch; (ii) the host’s affinity for the 
metastable form of the photoswitch must be 
substantially lower (because of its different 
shape and/or polarity) so as to render coencap- 
sulation with the photosensitizer unfavorable; 
and (iii) an open cavity and/or conformational 
flexibility should promote rapid guest-binding 
and -release kinetics by the host for fast eatalytic 
turnover and to ensure that once generated, the 
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metastable form of the photoswitch is expelled 
from the cavity before it is resensitized. 

In principle, DESC is applicable to other classes 
of photoswitchable compounds, although larger 
or differently shaped hosts may be required. 
Self-assembly through metal-ligand coordination 
offers an attractive approach to generating a 
wide range of hosts from simple components 
in a modular fashion. 

As demonstrated in this work, DESC is a ro- 
bust process that works with catalytic amounts 
of sensitizers, under ambient conditions (no 
oxygen exclusion necessary), and for a wide 
range of azobenzenes and heterocyclic azoarenes. 
We envision that DESC will become a powerful 
tool to control chemical reactivity through a com- 
bination of light irradiation and confinement. 
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By Lan Nguyen Chaplin 
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My failure comeback 


was driving when my phone alerted me to a new email. Filled with eager anticipation, I pulled 
over, turned on my hazard lights, and opened it. My emotions quickly changed as I learned, for the 
sixth and final time, that I had been denied a promotion to full professor. I was devastated that my 
institution didn’t seem to value what I brought to the table. But when I told my family that night, 
my children offered a surprisingly upbeat response. They were excited to see what I was going 
to do next, they said. They apparently knew long before I did that losing my bid for a promotion 
would turn out to be the best thing that could have happened for me. 


This had been the final step in a 
long, arduous process spanning 
15 months. I had started by study- 
ing successful promotion bids and 
asking senior scholars for frank 
discussions about my readiness. 
I had meticulously prepared my 
application packet, summarizing 
everything I had accomplished in my 
career. But after I submitted my ap- 
plication, every few months I heard 
the same thing: The votes were not 
in my favor. After each “no” I could 
have withdrawn my case, reading 
the writing on the wall that it was 
unlikely to be successful, but I re- 
fused to back down. So, despite the 
negative votes, my case proceeded up 
through the university’s bureaucracy, 
ending with a failed appeal. 

To my surprise, having a final 
answer brought a welcome sense of 
closure. Once it was all over, I realized how debilitating the 
process had been. For more than a year, I had spent hours 
every day trying to prove my worth to my university. I was 
exasperated. I was underweight. My self-worth was at an all- 
time low. I just wanted to regain my health and happiness. 

I thought about looking for a job elsewhere at an institution 
that would appreciate me. But I was so exhausted from trying 
to convince the institution I had served for nearly 9 years of 
my merit that I could not muster the energy. As a first step 
toward healing, I wrote a letter of resignation—but instead 
of sending it, I saved it on my desktop, ready to attach to an 
email at any moment. I also kept a journal to process my feel- 
ings and plan my next steps. In doing so, I realized I love work- 
ing in academia too much to quietly quit. Instead, I decided 
to carry on in my existing position for a while. I vowed to pri- 
oritize my own values, regardless of what my department or 
institution expected of me. After all, I had spent years allowing 
those expectations to guide me and ended up frustrated—so I 
might as well follow my own internal compass instead. 


“tee 


“| may have lost my bid for a big 
promotion, but in the end, 
it brought me to the right place.” 


I began to say no to work that 
wasn’t personally rewarding so 
I would have more time to spend 
with my children, exercise, eat well, 
and sleep more. I learned how to 
meditate. I disconnected from 
people in my life who violated my 
values, and cultivated my relation- 
ships with those who share my 
priorities and bring out the best 
in me. I founded a nonprofit that 
helps first-generation and low- 
income students and young profes- 
sionals advance in the workforce 
while serving their community. The 
initiative had long been a dream of 
mine, but I never pursued it be- 
cause typical academic hiring and 
promotion rubrics don’t reward 
such efforts. Now, such consider- 
ations were no longer my North 
Star. I felt liberated to redefine suc- 
cess as something more than a title, rank, or salary. 

As my mental health improved thanks to these changes, I 
plunged back into my academic work. I ramped up projects 
that aligned with my values. I reassessed my institution for 
fit with my career goals and concluded it was no longer serv- 
ing me well. I tapped my network of friends and academic 
contacts, who gave me a place to vent, helped me strategize 
an exit plan, and offered to write letters of recommendation 
for other jobs. 

Five months after that final email from top leadership, I 
found myself in the car again, experiencing another career- 
defining moment. The phone rang, and this time it brought 
a job offer with a promotion to full professor from a uni- 
versity that values what I have to offer. I may have lost my 
bid for a big promotion, but in the end, it brought me to the 
right place. = 


Lan Nguyen Chaplin is a full professor at Northwestern University. Send 
your career story to SciCareerEditor@aaas.org. 
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